What's with the version numbers? |
|
The sources for the charts are of such varied levels of quality that there are
bound to be mistakes in the way the data is assembled. There is a continual
effort to fix the most obvious problems in the data, and to add in additional
charts as they become available. The version number tracks these changes, so
if an issue is identified it can be tied to the version of the data that it
was fixed in.
The version number of this particular data set is 2.8.0052. There is a
CSV File that contains all the song and album entries
listed on the site. The file csv/top5000songs-2-8-0052.csv contains
a listing of the top 5000 songs (in case 1000 is not enough for you) and the file
csv/top3000albums-2-8-0052.csv contains
a listing of the top 3000 albums.
CSV File: tsort-chart-2-8-0052.csv
The file tsort-chart-2-8-0052.csv contains a complete set of the data
published on this site in a form that is both convenient (since it is in a single place)
and easy to manipulate. Each entry in the file has the following columns:
- artist: The name of the artist
- name: The name of the entry (song or album title)
- type: The type of the entry, either song or album
- year: The year allocated to the entry (may be unknown)
- score: The score that this item achieved this number may be preceeded by a space, this allows
a text sort to get the values in the appropriate order
- songentry_pos: The song's position in the Top Songs
list (or the empty string if it is not in it)
- songyear_pos: The song's position in its Song Year list
- songartist_pos: The artist's position in the Top Artists list
(note that only the artist's top song is labeled in this way)
- songtitle_pos: The song title's position in the Top Titles list
(note that only the most highly placed cover version is labeled for each title)
- songdecade_pos: The song's position in its Song Decade list
- namsong_pos: The song's position in the Top North American Songs list
for its year
- eursong_pos: The song's position in the Top European Songs list
for its year
- albumentry_pos: The album's position in the Top Albums list
- albumyear_pos: The album's position in its Album Year list
- albumartist_pos: The artist's position in the Top Album Artists list
(note that only the artist's top album is labeled in this way)
- albumdecade_pos: The album's position in its Album Decade list
- notes: The chart entries for this item (this follows the templates defined on the
Song Charts and Album Charts pages)
Technical Details: The format used is a conventional "Comma Seperated Values"
file that uses strictly ASCII encoding (ie bottom 7 bits only). The Windows <CR><LF> line end
sequence is used (0x0D 0x0A). The first row defines the names of the columns using lower case alpha
characters plus underscore (code 0x5F). Within the actual data every item is enclosed within double quotes
(code 0x22). There are no escape sequences in the data since the double quote charater does not
appear in any data value (at least it shouldn't). However, single quotes (code 0x27) and commas
(code 0x2C) do occur with the data.
Note: If you load this data into Excel you will notice that some of the entries get screwed up, for example the
song "1-2-3" gets converted to "01/Feb/2003" (and hence to the number 37653. This is
because of the stupid way Excel treats things it thinks could possibly be dates. There are ways to stop it
doing that but you'll have to ask Google for details.
CSV Files: csv/top5000songs-2-8-0052.csv and csv/top3000albums-2-8-0052.csv
These two files list the top 5000 songs and
3000 albums (including some that are
not listed anywhere on the site). The listing showing which charts each entry was in has been removed, only the summary
scores are provided. Each entry in these files has the following columns:
- position: The item's position in the overall list
- artist: The name of the artist
- name: The name of the song or album
- year: The year allocated to the entry
- final_score: The final calculated score for the item (after factors have been applied)
- raw_usa: The raw score from the USA based charts
- raw_eng: The raw score from the non-USA English speaking countries (UK, Canada, Australia, New Zealand, South Africa, Ireland and Hong Kong)
- raw_eur: The raw score from the non-English speaking countries in Europe (Germany, France, Austria, Norway, Sweden, Italy, Switzerland,
Spain, Netherlands, Belgium, Poland, Denmark, Finland and the Vatican)
- raw_row: The raw score from the rest of the world
The four regions are given weights that match the total value of music industry sales, so usa 35%,
eng 20%, eur 25%, row 20%. In practice we don't have enough entries for each region in
every year, the way we fix that is described elsewhere.
If you want your spreadsheet to get the same final score as we do in then email us.
Year Factors File: csv/yearfactor-2-8-0052.csv
The current set of year factors are provided in this CSV file. For each year this has the factors
that are applied for songs and albums across each of the four regions.
Old CSV files
These contain the results of calculations of the top artists and songs
from older versions of the data. They are here only for historical reasons,
doing the same calculation with the most recent data would usually deliver
better results.
Old Factor Values
As the volume of data from different eras and locations changes we adjust the year factors
to balance out the scores. This is done by investigating the songs and albums in positions
1000-2000 and ensuring that after processing they have a reasonable distribution. This
technique ensures that the top 1000 items don't influence the factors, but forces us to
keep explicit values (rather than calculating the factors on the fly). Here are
the values of factor tables for various versions:
Version History
Here are some of the highlights of the released versions
- 2.8.001 - 4 Dec 2017 - Added ODK German charts
- 2.7.001 - 24 May 2017 - Added chart2000.com year chart for recent years
- 2.6.001 - 18 Aug 2016 - Added the Billboard Hot200 album charts and adjusted factors to suit
- 2.5.001 - 2 Dec 2015 - Change the year weight claculation to focus on multi-region success
- 2.4.001 - 27 Nov 2015 - New values for parameters, short lived calculation change
- 2.3.037 - 30 Sep 2014 - Added extra recent data (like Japan number 1, Oscar and Grammy)
- 2.3.001 - 30 Jan 2013 - Readjust paramaters to bring results back into line
- 2.1.026 - 27 Aug 2012 - Radical refactoring of generation code, should have no impact on results
- 2.1.022 - 20 Jul 2012 - Reduced the scaling for years after 2007 to reduce effect of years with little real data
- 2.1.000 - 13 Nov 2011 - Made small adjustment to the weightings to smooth over some cross year issues
- 2.0.000 - 22 Oct 2011 - Changed the calculation method to smooth out region issues
- 1.10.022 - 14 Jun 2011 - Added SNEP singles chart for France
- 1.10.019 - 30 May 2011 - Refactored chart tables
- 1.10.001 - 5 Mar 2011 - Billboard data from 2009, changed year controls
- 1.9.050 - 22 Jun 2010 - Add first quiz page
- 1.9.001 - 4 Jan 2010 - Start push to add charts 2006-2008
- 1.8.036 - 19 Oct 2009 - Added region image to artist pages
- 1.8.009 - 24 Aug 2009 - Clarify rules on live and remix tags
- 1.8.001 - 11 Aug 2009 - Updated version of UK charts
- 1.7.030 - 5 Jul 2009 - Gold charts for US, UK, Germany & France
- 1.7.001 - 27 May 2009 - Belgium data, regional comparison pages
- 1.6.024 - 4 Apr 2009 - Add feedback form to each generated page
- 1.6.001 - 13 Dec 2008 - Retired Billboard chart (Bullfrog data is better) and added certifications
- 1.5.039 - 7 Sep 2008 - Brazil chart added
- 1.5.033 - 24 Aug 2008 - Improved layout of site index pages
- 1.5.015 - 2 Aug 2008 - Site Index, Chart Consolidation
- 1.5.002 - 29 Jun 2008 - Extra per-page info, Bullfrog chart added
- 1.4.007 - 14 Jun 2008 - Analysis of greatest songs and albums
- 1.4.003 - 10 Jun 2008 - Tokyo chart added, page structure redefined
- 1.3.072 - 25 May 2008 - Osaku chart added from Japan
- 1.3.027 - 20 Oct 2007 - Reconciliation across charts
- 1.3.013 - 3 Oct 2007 - Polish charts, check page structure
- 1.2.036 - 15 Sep 2007 - Escaping special characters in names
- 1.2.031 - 4 Sep 2007 - Version number placed into the PID record
- 1.2.021 - 30 Aug 2007 - Order chart entries in importance order
- 1.2.011 - 21 Aug 2007 - OzNet Charts, indirect links
- 1.2.008 - 12 Aug 2007 - Virgin albums, WXPN charts
- 1.2.004 - 5 Aug 2007 - Added album listings to output
- 1.1.004 - 20 Jul 2007 - Added Swedish, Norwegian, Austrian & Swiss charts
- 1.0.463 - 18 Jul 2007 - Added French charts
- 1.0.457 - 10 Jun 2007 - Initial Public Release
Back to Introduction
|
Previous Comments (newest first)
12 Jun 2018
You forgot to write some highlights for versions after 2.8.001, I found the chart now is different from 2.8.001,even 2.8.007
The notes only highlight major changes of algorithm. The whole point is that EVERY version is different, that's why there are version numbers
16 May 2018
albums below 3000
do you have a list of top 1000 albums, that includes all albums from 1000-the most?+
Yes, the "Albums" page lists the top 1,000 albums. And the CSV file has a specific column for it ('albumentry_pos')
17 Aug 2016
Full Spreadsheet
Can you bring back the full spreadsheet that lists all the songs and albums on this site?
The full spreadsheet never went away. It can be accessed through the versions page, or various other links. We suspect that you have a stale link to a version that is no longer available, remember that the CSV file name includes the data version number, so at the time of writing this the file is called "tsort-chart-2-6-0001.csv" but shortly the most recent version will be "tsort-chart-2-6-0002.csv" (and so on).
The version number is crucial to understanding what the data is telling you, so we won't ever create a file called (for example) "tsort-chart.csv". The text at the foot of each page tells you the version number (and links to the version page)
15 Mar 2016
Regional rankings for the long list csv file
First, I want to thank you guys on this site for the hard work that has been put into making theses lists and making them available to us.
Second, I have a request about the regional scores in the ultra-long list csvfile (tsort-chart-2-5-0022.csv). Currently there are rankings per region columns which only show the top 20 rankings, leaving many blank spaces in the columns for songs not in the top 20. Surely if you used the raw score for usa,eur,eng as in the top5000songs-2-5-0022.csv file, it would be more useful as it could be sorted and ranked by these scores for every song on the list and then filtered by year, decade etc?
The regions in the tsort-chart files are different from those in the raw files. In tsort-chart there is "North America" (i.e. the US and Canada) and "Europe" (i.e. UK, Germany, France, Ireland etc). In the raw scores the regions are "USA", "Other English Speaking" (UK, Australia, Canada, Ireland etc), "Rest of Europe" (Germany, France, Italy etc) and "Rest of World". Notice that Canada, Australia and the UK, for example, are grouped differently.
The listings in tsort-chart are meant to be only reliable ones, by restricting ourselves to just the top 20 (and only in certain years) we can achieve that. Of course that does mean that most of the songs are blank (i.e. we don't have enough data to deliver a reliable answer).
The listings in the top5000songs and top3000albums files are far closer to the raw values. Our expectation is that people who want that level of detail won't mind a bit of calculation and will be aware that a song scoring 5.001 is "really" the same as one scoring 5.000 (they are both in position 4000 or so). The full calculation we use is described in the FAQs (and requires the yearfactor csv file as well).
If you want to know what was the 927th biggest hit in Europe in the 1970s you can use the top5000songs (and the yearfactor file) to calculate exactly that. We would suggest that the results produced would be, lets say unreliable, but we provide the data for anyone to do that.
If, while you are trying things out, you uncover anything interesting we'd like to hear from you.
18 Dec 2013
2012
hello, hope to see the top songs for the year 2012 soon
The first version should be available in the next few weeks... they should be trustable in about 2017
17 Jul 2013
hello everybody, I really like this site. Hope that it is still being updated since lately there were no updated, especially to the recent years top 100.
It is being updated, but the recent charts have not been touched for some time
1 Sep 2012
2.1.0026
this version doesn't contain any evaluation in song_pos+year_pos+artist_pos!?
The refactored code which shouldn't have changed anything had a bug in it, as a result the fields you mention were blanked out in the CSV file.
We've fixed the code (and added in entries for the decade positions, North American positions and Eurpoean positions).
Thanks for point out our mistake
12 Nov 2011
reshuffle
i really like this site. may i ask how come recently a reshuffle in the top lists occured
The way we calculate the scores was changed quite radically last month. That is why the data version number went from 1.10 to 2.0
The reason for the change was that some users pointed out some anomolies in the way songs from the 1990s were ordered. Particularly music that that had success in the USA but didn't do well in Europe. We modified the scoring system to overcome some issues with having too many charts from smaller countries.
The first attempt introduced some other unwanted features so the algorithm has been further tuned. We hope that the overall result is a better.
If you see any results that look "odd" tell us about them.
10 Mar 2011
looking for all the number 9 chart positiions can you help please
Looking for all the number 9 chart positiions can you help please
The first thing to say is that we list positions from a large number of charts, if you want data from a particular chart you'd find it easier to get that data direct. For example, you just wanted the songs that reached number 9 in the billboard charts then you'd find it easier to use the "Bullfrog" listing (see the "Source Charts" page to see how where to get it). Then you can use a spreadsheet program to identify entries that reached the number 9 slot.
The easiest way to do any calculation is to download the CSV file (from the page that describes the version numbers).
Its not clear if you are looking for the number 9 positions in our annual charts, or for songs that reached number 9 in one of the source charts.
If you load the CSV file into a spreadsheet you should be able to filter on the "year_pos" attribute and select just the songs with 9 in that column.
Alternately if you're looking for songs that reached number 9 in the source charts we don't have the data to identify all songs that were at the number 9 position at some time in their run, but we can tell you which songs peaked at number 9.
The easiest way to do this calculation is to search the CSV file for the a string " 9 ". For example on Linux (or CygWin) the command:
grep ' 9 ' tsort-chart-1-10-0003.csv
will list all the songs that peaked at number 9 anywhere. Alternately the command:
grep 'Holland 9 -' tsort-chart-1-10-0003.csv | grep " 197[3-7]"
will identify all the songs that peaked at number 9 in Holland and then select only those which have years in the range 1973-1977.
Combining simple searches in this way can quickly identify songs, for example the command:
grep 'Holland 9 -' tsort-chart-1-10-0003.csv | grep -i london
shows that the only song which peaked at number 9 in Holland and mentioned London in the title was Ralph McTell's "Streets of London" from 1974. Of course we don't know why anyone would want to do that particular search.
20 Feb 2011
CSV file II
Yes i'm novice.
I really appreciate the "lesson" above wich shows the great character you have. A big big THANK YOU.
27 Jan 2011
CSV file
I've got ideas about what I'd like to do but have no skills on working with your excel file.
One of them would be making a list with 2011 songs (1011 more than you show) representing the actual year.
Would you please teach me how to do it?
Another thought: Would it be possible for you to do a month song table?
For example: a list of 30 songs that did good in january in all years. 30 songs = 30 days :)
Every now and then I visit you to appreciate your great work.
Thank you guys!
The CSV file allows you to try all kinds of different orders.
Let us do some examples using Excel 2007 (you'll be able to do similar things with any good spreadsheet program). The explination below assumes that you really are a complete novice in using Excel, if you find them a bit too simplistic we apologise.
First to list the 2011 top songs just select the "Sort" function under the "Data" tab. This will give you a "Sort" dialog. Make sure that the "My data has headers" toggle is ticked. Sort by "type" ("Z to A"), then by "score" ("Largest to Smallest"), then by "artist" ("A to Z"). The "Add Levels" button lets you insert the additional criteria.
That should give you all the songs in order, the first 1000 will already be numbered for you in the song position column (song_pos).
Suppose that instead of sorting all the songs we want the 1000 highest scoring songs for the period from 1975 to 1995. We can do this by inserting a new column. Click with the right mouse button on the "F" just above the column heading that says "song_pos", select "Insert" from the menu that comes up. That gives us a new blank column. In the top cell put the string "in_period". Click on the second cell then in the text entry area at the top (next to where it says "fx") enter the text '=IF(D2<1975,"",IF(D2>1995,"","yes"))'. Now click on that cell and press <Ctrl>-C. Click on the F3 cell scroll to the bottom of the sheet and hold the <Shift> key down while clicking on the F64355 cell, that should highlight a column of cells, finally paste with <Ctrl>-V.
Now if we go to "Data"->"Sort" again, add "in_period" as an extra sort criteria (use the blue arrow to make it the top one). We have an ordered list of the songs from 1975-1995.
How about if we want to adjust the scores? For example lets find the most outstanding albums of each year. First we have to decide what the term "outstanding" means, we have a varying number of charts for each year, so the scores are biased towards years with lots of charts. So lets adjust the scores by dividing them by the average score of the 10th to 20th positions in each year.
So if we sort by "type" (A to Z), then "year", then "ayear_pos" (Smallest to Largest). Now insert two columns to the left of "song_pos". One we'll call "sum10to20", the other "factor". In the first column (cell G2) insert the formula "=IF(OR(D2<>D3,N2>20),0,IF(N2<10,G3,G3+E2))" (where D is the "year", and N is the "ayear_pos"). Copy that to the rest of the cells in that column. In the H2 cell insert the formula "=IF(D2=D1,H1,G2/10)". Now all the cells with the same album year should have the same factor in them. Now select column H by clicking on the button, all the cells should be highlit with a solid border. Copy these cells with <Ctrl>-C and on the "Home" tab select the pulldown menu under "Paste" in that menu pick "Paste Values". Now we can shuffle the cells and the values in column H won't change. Change column G's name to "adj_score" and in cell G2 insert the formula "=IF(H2=0,0,E2/H2)" and copy it to the rest of the column. Finally sort by "adj_score" (Largest to Smallest).
This gives us a list of the albums that are furthest ahead of their contemporaries, like "Genius of Modern Music, Vol 2". You might decide that the factor is a bit too agressive, changing the formula in column G to "=IF(H2=0,0,E2/SQRT(H2))" makes the order more reasonable (after sorting again of course).
Many of the charts we use don't tell us which month an entry belongs to. And, of course, if a song enters a chart in May and is in the charts for 12 weeks it probably spent more of June in the charts than May. So its hard to see how we could claculate a reasonable "Monthly" chart.
We appreciate your support and suggestions.
14 Oct 2010
version control
I have been following your ranking system for a long time. I like it a lot.
The only complaint I have is that you change the ranking too frequently. I would recommend that you make the ranking change say every half year. Each half year ranking you give a version say Ver. Jan 2010 etc. It is the same thing like your software have different versions.
By doing this it will make my life easier cause I am using your ranking to collect the music.
Every time you make the change I have to adjust too. It is almost an impossible job to follow your floating ranking.
Thank you for your consideration. Or you can find some better solution.
The obvious solution would be for you to download the CSV file (linked on the "Version" page) and use that for a period of time.
We estimate that the data is better than 99.4% correct, this means that we "only" have 1600 or so errors!
We have a continual effort to identify and correct discrepancies between the various source charts, supported and encouraged by users that tell us about corrections that are needed.
It is true that these corrections end up shuffling some of the songs, especially those from before 1940, after 2007 and low down in each artist's song list. That is eaxctly those we have least evidence for.
In addition these "less measured" parts of the data get changed when we identify and integrate new sources. We suspect that has a bigger impact on the items you are collecting.
However we feel that publishing the most recent version of the data, with the most up to date corrections, is more important than having a list that stays static at the places where the rankings are less certain.
Our version numbers are, of course, exactly like software release numbers (of course they ARE software release numbers). But our approach is more "open source" than "big software developer", so, like open source software we release often.
As we said at the beginning we would suggest you download the CSV file and use it as you master list for a while, switching to a new version when YOU feel the time is right to do so (rather than when we decide to release a new set).
Oh, and thanks for the support.
13 Sep 2007
vince spain
When do you change the charts? can a song moves in the chart? what charts are closed?
As song or artist names are corrected they may affect the order of the charts. For example the Vaughn Monroe song "Riders in the Sky" was listed as "Ghost Riders in the Sky" in the European charts. So that chart had "Riders in the Sky" as number 7 of 1949 and "Ghost Riders in the Sky" as the number 33. When the European entries were changed to be the same as the rest of the charts the new chart placed the song at number 1 for the year.
The quality of the input data varies from one source to the next. As continual effort is expended to refine the data it is inevitable that the positions of some resulting entries will change (reflecting a higher quality set of results).