|
|||||||||||
AW: Some thoughts about Strength Tests
I like the Elo system a lot - as far as I can judge the elo ratings and rankings produced by Elo are pretty fair and very useful for making predictions. Everything written in the above summary is in my opinion correct, except for one key assumption which is nowadays completely outdated:
"A further assumption is necessary because chess performance in the above sense is still not measurable. One cannot look at a sequence of moves and derive a number to represent that player's skill." ![]() Why not - let's have a try ! Thanks to your work (and Eric's) it is possible to see the results, do the comparison getting a final judgement. I do believe that it is also possible to use a large position test (with at least 200 positions and lots of chess computers) in order to get quite reliable elo ratings - with Elo-Stat by Frank Schubert. But that's another story. Some day in the future (maybe not in our lifetime), I fully expect that you will be able to take the complete game database of for example Wilhelm Steinitz or Garry Kasparov, plug it into the
master computer program and get the strength results, overall and year by year for each and every player, and accurately compare the evolving improvements of chess knowledge throughout history for players and computer programs. ![]() Once in the future would it be possible to get through our Schachcomputer.info Tournament list...? You need a starting point in "history epoch elo" for initializing - you needed a starting point as well in modern elo calculations "initializing 1966-69". Where is the difference at the start - the lower data base - what else ? While we can safely bet that Elo did a careful job of rating historical players, inevitably many choices have to be made in such an attempt, and other approaches could be taken. Indeed, Clarke
made an attempt at rating players in history before Elo. Several other approaches have recently appeared, including the Chessmetrics system by Jeff Sonas, the Glicko system(s) of Mark Glickman, a version of which has been adopted by the US Chess Federation, and an unnamed rating method applied to results from 1836-1863 and published online on the Avler Chess Forum by Jeremy Spinrad. By and large these others have applied sequential updating methods, though the new (2005) incarnation of the Chessmetrics system is an interesting exception (see below) and Spinrad achieved a kind of simultaneous rating (at least more symmetric in time) by running an update algorithm alternately forwards and backwards in time. There are pros and cons to all of these."[/I] Many years ago there was an article in CSS about Jeff Sonas - his contribution was not convincing to me. Sonas Rating Formula - better than Elo ? No - IMHO. AFAIK John Nunn critized a change concerning predicting game results made by Jeff Sonas (Nunn on the K-factor: show me the proof, 2009). Mark Glickman - AFAIK his Glicko-systems (contrary to the Elo-system) aren't totally stable - they could be flawed when intentionally manipulating data - of course that's not the case here. https://www.reddit.com/r/TheSilphRoa...jor_flaw_in_a/ Is there a Glicko-program that can use pgn-data for calculating elo - or only Excel-sheets provided ? ... as there is a difference between performance and strength. For example, De Labourdannais maybe have performed at a level of 2600 against his opponents.
But the strength of play in the early 1800s may also certainly been lower than what it is today. This is the difference between strength and performance. The elo system only measures performance - nothing else. So IMHO that's the drawback in your system using the moves of a game - you are regarding not the performance but only the strength of a person, which is not the aim of the Elo-sytem. To say it more drastically: the strength of a person or computer is totally irrelevant for calculating elo - only performance matters ! Nothing else than 1-0, 1/2:1/2, 0:1, simple as that ! In Summary similar to EDOChess who rates his results with EDO, as my strength tests are not performance based, I have changed all references to ELO to STR which is short for STRENGTH or
SPACIOUSMIND TESTS RATING. Whichever you prefer. ...In my Tests I have changed ELO Sneak to STR Sneak, which you will see on future updates that I will provide for download. You shouldn't call it Elo...STR...or maybe StrElo. My STR ratings formula under Renaissance was of course created to approximate ELO, in order to have a fun comparison. The exact same calculations will be used for all future tests and the test
results may likely differ through the history of the games; however, the constant will always remain the same, which is the distance between the Master and its subjects, game by game and average. The main thing after all - it's fun ! Regards Hans-Jürgen |
Folgende 4 Benutzer sagen Danke zu CC 7 für den nützlichen Beitrag: | ||
|
|
![]() |
||||
Thema | Erstellt von | Forum | Antworten | Letzter Beitrag |
Review: Chessnut Evo - My Thoughts | Ray | Die ganze Welt der Schachcomputer / World of chess computers | 2 | 09.12.2023 12:27 |
Tipp: Android- und iOS-Stoppuhren für BT-Tests | Robert | Teststellungen und Elo Listen / Test positions and Elo lists | 0 | 18.11.2013 14:24 |
Frage: real strength of Novag Super Nova | IvenGO | Die ganze Welt der Schachcomputer / World of chess computers | 9 | 22.07.2013 10:53 |