Saturday, March 16

What a Great Mens Championship at 2013 Worlds

... said no one ever.

Figure skating fans were already worked up over the scoring of four-time World Pair Champions Aliona Savchenko and Robin Szolkowy earlier in the day, when major errors on both individual jumping passes still earned them 2nd place in the free skate and the silver medal overall.

And just when we thought the 'controversy of the competition' was out of the way, Patrick Chan fell all over the ice in his free skate yet still won the overall World title by two points over Denis Ten, who had the skates of his life in both programs. Chan's program was only deemed 5.51 points worse than Ten, and he held a substantial lead following his amazing world-record setting short program.

Current skaters, former skaters, and fans alike heated up Twitter last night. American skater Christina Gao created the hash tag #BSWorlds13, a play on the actual tag being used for Worlds with the F being replaced by the B and the S having just a slightly different meaning.. you get it.

So why are so many people that dedicate time to the sport so fired up? In the end, Chan did land two quadruple toe loops in the free skate and had wonderfully choreographed programs full of intricate in-betweens that he made appear seamless. Shouldn't that be rewarded? Yes, what he did *successfully* absolutely should be rewarded.

THE PSYCHOLOGICAL GAME

Figure skating, whether we want to admit it or not, is a psychological game of scoring for both fans and judges.

How many times have we been totally bored by skaters one year, and then all of a sudden they come up with an interesting or unique piece of music and we decide to love them and their 'program', even if the quality of the skating has not improved at all?

How many times has someone skated to Turandot or any other warhorse music and people exclaimed what a beautiful program it was, when it really was a horrible interpretation to a great piece of music?

How many times have former Champion skaters delivered poor performances or watered down programs in both the technical and program component senses but still received marks that put them right near the top?

How many times has a quadruple toe loop or triple Axel caused an electricity in the building, which, in turn, seems to make the judges go way up with their program component scores versus skaters with lighter jumping content but may be much, much stronger skaters?

Let's stop there. Andrei Rogozine skated in the second group of four last night in the mens Championship. He was skating on home ice, and produced a brilliant effort as far as the jumps were concerned: a quad toe loop and two triple Axels-- one as a three-jump combination. He received 68.22 as his final components score, or a 6.82 average between each of the five sections that are marked.

Earlier in the night, both Peter Liebers and Jorik Hendrickx skated strong, quality programs with much more choreography and transitions than Rogozine, but they were scored at 66.72 and 60.06 points, respectively, on their program component sections. Liebers skated two slots before Rogozine, and Hendrickx was in the first starting group.

There are several phenomena happening here: in addition to the other points I listed above, we all know that there seems to be a strong correlation between start order and the raising of program components throughout a competition. Since the top skaters according to the 'ISU World Standings' compete late in the short program, the general assumption should be that those skaters do indeed deserve the highest components, right?

Well, the judges sure seem to think so. Or is it really that they truly *think* that?

I think there is a mix of situations happening. While the IJS was ideally created to score the skater versus a point system (as opposed to skaters versus each other), the influence of the big tricks technically, the crowd response, and the reputation of the skater all seem to go into the final marking of the program components.

For example, Liebers was great technically, had a very, very strong program choreographically, but was reserved in his interpretation to the music. This meant that the crowd reaction was pretty 'normal'. Rogozine, skating at home, lit up the crowd but had even less going on in terms of the interpretation to the music.

The judges have about a minute or so after a skater ends their program to check any questionable elements or change any scores AND enter their program component scores. I was also judging this in real time, calling my own levels for each element, assigning a grade of execution, and issuing program component scores all in the time that the judges just have to do two of the three.

When a judge has to be so fixated on the 13 elements a man has to perform within four and a half minutes, the components can become much of an afterthought. And this is why I believe we see so many instances of 'components by starting order' or by the technical effort. The crowd was fired up for Rogozine in that one minute the judges have to enter their marks. Psychology game-- for some of them.

WHAT NEEDS TO HAPPEN WITH COMPONENTS

There have been so many different instances of IJS in its near-ten year existence. At one point, the panel was split into two: one that assigns the GOE of the elements while levels are still called by a three-person technical panel, and one that assigns scores for the five program components. That practice lasted one competition, and, if I remember correctly, the amount of judges was deemed to cost too much for the ISU to send to each event.

The ISU holds workshops and the technical committee publishes criteria for the scoring of components-- for example, what a 5.00-level skating skill would look like versus an 8.00 skating skill. Yet we still see the panel of 9 judges fluctuate their components marks greatly after marking the exact same performance. Too much to do at once, and/or marking by start order or reputation.

Here's a task. Watch a program and ignore counting the rotations of jumps, the actual jumps themselves, the spin rotations, etc. Don't have a battle in your head whether jump X should have been a -1 or -2. Just watch the skating quality, what the skater is doing between the elements and how they are linked together, how much the skating matches the music, and things like that. You might see a totally different program-- for better or for worse.

Whether 'experts' of the sport or not, I think the same thing would happen for the judges, and that is why I recommend, once again, that the panels be split up. Five judges for components, and five judges for the GOE marking (which doesn't seem to differ as much as the components do). One judge more than there is now, and actually three less when you consider that 13 total are chosen and then rotated between segments of the competition. The ISU should produce some type of test for the judges that would like to score components where they have to name, for example, all of the criteria of skating skills from score 1.00 to 10.00. Performance/execution and interpretation is always going to be subjective, but not to the point that a skater completely ignoring the music is receiving a 7.00 or higher for the way they 'interpreted' it.

Here's something else you can try. Watch a competition. When it comes to the highest-ranked skaters, use the following formula for your own program component scoring:

Final Group: Start your Skating Skill mark anywhere from 7.50 to 9.00. Your decision.

From there, -0.75 for transitions, -0.25 for performance execution, +0.25 for choreography, and +0.25 for interpretation from that original score. Odds are you will be right in line with the judges!

The 'predictability formula' of the program components needs to go. We need to see transition marks that are higher than the other four-- which is rare. We need to see interpretation marks much lower than choreography marks if that is what the case actually is, and so on. Giving the judges ONLY these five areas to focus on, and we might see a greater fluctuation that is more indicative of what the skater is putting out there.

WHERE DO FALLS FIT IN?

A comment I see frequently on Facebook and Twitter between fans discussing figure skating is that they cannot understand how a skater still had very high components scores with a fall. As it stands, there is no specific criteria for reducing components after major errors such as falling.

In the early days of IJS (it was called COP), any fall made the skater lose 0.50 points off whatever the value of their performance and execution mark would have been. The problem is that only the most elite skaters were kept in mind. What if you are judging a novice or even junior-level competition and the skater would only be worth of a 3.50 or so to begin with, and then they fall 4 times? Is it really a 1.50 performance now? Obviously, this isn't going to work.

A comment made to me when I suggested a few ideas to fix the system was that, "I don't think being reduced to Algebra 2 is the answer." But really-- isn't skating already a numbers game? So, my idea for performance and execution would be the following:

For any fall (major or minor), 5% is reduced from the P/E score that was to be given.
If a judge enters a 9.00 for Patrick Chan but he fell once, he would now receive an 8.50 at best. (Round to the nearest 0.25 increment).
If Chan were to fall twice, 10% would be reduced from the initial P/E score. He's now at an 8.00 maximum.

Many fans seem to want performance/execution scores to drop into the 5.00 range for someone like Chan if he's falling all over the place-- but if he falls twice and does eleven other elements beautifully in the free skate, does he really deserve something like a 5.00? I'd be fine with an 8.00 after two falls if the judge deemed his level to start at a 9.00. This section would need some tweaking, but the general concept is presented.

Falls can also affect, for example the interpretation-- but that is up for the PCS judge to decide rather than having a 'required' reduction.

THE BASE VALUE AND GRADES OF EXECUTION

Another phenomenon I see often in the scoring of programs is that top-name skaters seem to be given much more generous grades of execution for their elements versus a skater that has not 'paid their dues'.

I got into a discussion with a Twitter follower over the scoring of Yuna Kim's triple Lutz-triple toe short program combination the other day versus Kaetlyn Osmond's triple toe-triple toe combination. Kim entered the jump with no transitions, had nice distance on the Lutz, and had good rotation on the flip. All-in-all, a *good* jumping pass that carries 1.9 more points base value than Osmond's.

Osmond, on the other hand, did footwork down half of the rink, including turning in the opposite direction directly preceding the jump, carried great speed, and had big distance on both of her jumps. She also did edge-work and a kick to show her balance right after landing the combination. However, her final grade of execution was less than that of Kim's.

Why?

Reputation judging. It's basically telling skates like Osmond that no matter how much more difficult she actually makes the element for herself, good luck earning those top points. Because trust me-- the fact that she's doing any kind of footwork into her *combination*, let alone the fact that it goes in both directions, is insanely difficult. I'd even argue her combination is worth a +3 while Yuna would be at a +1 for me. In the end, Kim would score a 10.8 for the element and Osmond a 10.3. I could totally live with that.

That Twitter follower, by the way, replied to me with something about how Osmond scored really well considering it was her 'first time' [at the World Championships]. Last time I checked, you mark what you see on the ice, not how long the skater has been around or what titles they have earned previously.

Reputation judging in the GOE needs to go, too. Those points really add up more than you think.

MY OWN JUDGING AND THE SERIOUS PROBLEM

Those of you who follow me know that I have been scoring many of the programs myself throughout the week of the World Championships. In some cases, there has been as little as .05 or .10 points separation between myself and the judges on the *final* segment score. Great, right?

It would be great if my own scores and following the rules actually made sense to me. Even with Patrick Chan's disastrous skate last night, I had him at 164.28 points for the segment; Denis Ten earned 166.61 points from me. 2.33 points-- that's it. You've got to be kidding me.

The funny thing is that while I was scoring the first half of Ten's program (in which he was completely on fire), I was thinking to myself, "Wow, this score is going to blow away Chan's on my score card." But then when I saw what my final marks produced, it was definitely a WTF moment.

So we already know the judging can be suspect when it comes to program components and the grades of execution, but now we have a flawed system itself, too? Double whammy.

A few years ago, the grades of execution for poorly-completed elements (negative GOE's) was reduced so that trying a difficult element wouldn't be as much as a risk. This likely happened after Mao Asada was getting hammered for her under-rotated triple Axels as well as male skaters taking themselves out of contention when their quads failed. We hardly saw quadruple jump attempts for a few years.

Then the ISU decided that a fully-rotated quadruple toe loop, for example, with a fall, should earn 7.3 total points for the element. If you also consider the mandatory -1.00 point deduction for the fall, the skater essentially has earned 6.3 points. A base value triple Lutz, on the other hand, garners 6.0 points. Yes, a quadruple toe loop is much, much more difficult than a triple Lutz. But in what other sport does a complete *failure* of an element still earn the athlete most of the points they were attempting?

I'm not saying that a 0 is necessary for the element. What I would suggest is that there is another column added to the GOE scale where elements that are fully rotated, but fallen on, only receive X percent of the initial base value. Last night I suggested 25%, which makes the value of the jump 2.58 points.

I would do away with the 1.00 deduction for falls but rather it would be applied to the performance/execution score as described above.

Harsh to get a 2.58 for the quad failure, isn't it? The skater is still getting nearly double what they would receive if they'd double the jump, and I think that is absolutely fair. There still has to be some risk involved.

CHAN WITH ALL OF THIS APPLIED

Re-Scoring Patrick Chan's program with all of the aforementioned changes, you get this:

4T+3T 16.97
4T 13.16
3Lz 1.50 (FALL)
StSq4 5.60
CCsp3 3.73
3A< 1.65 (FALL)
3Lo 6.61
3F+1Lo+3S 9.70
FSSp3 3.24
2Lz+2T 3.87
ChSq1 3.40
2A 4.06
CCoSp4 4.29

77.78 total for TES now. He had 82.13. By the way-- reputation judging on that final 2A and a lot of other elements here. +3, really? Even +2-- really?!

SS 9.11
TR 8.96
PE 8.61 (-10% because of falls) now becomes 7.75.
CH 9.00
IN 8.96

87.56 for PCS now, while he had 89.28.

Total score isn't that much different, as there is no more 1.00 deduction for falls in my proposal. 165.34 versus the 169.41 he actually got.

Of course, the reputation judging for some of his elements seemed suspect, so in reality he probably should have been even lower.

However, since Denis Ten did not fall, all of his scores would stay the same and he'd wind up with 174.92 points in the program still, and this would be enough to win the title. Albeit, not by much, but enough.

People want to complain that my proposal is too much of a penalty for skaters trying the hard elements. I argue that it's just enough of a penalty that results actually make sense.

WHAT THE SKATERS NEED TO DO

I have never seen anywhere near the amount of skaters voicing their opinions at the result as I did last night. The skaters need to work together to come up with modifications to the system that they think would best reflect the performances that are being put out there, and then present them to the ISU. Everyone seems to have an opinion about why things aren't fair, but unless you get to the root of *why* it isn't fair, then all we have are endless amounts of angry Twitter posts.

10 comments:

Anonymous said...

Many nice points, Tony.

Intuitively, I like the idea of having two separate panels (one for TES and another one for PSC), although I don't quite see how that can reduce the degree of judges' subjectivity and favoritism.

I understand you expect the judges to take the time to carefully evaluate every element / component and to come up with a thoroughly built up (more fair and objective) score.

I don't see how that can help eliminate judges self-fulfilling prophecy / reputation based judging, especially when it comes to scoring worlds best skaters and when medals / Olympic spots are at stake.

Using your example "If Chan were to fall twice, 10% would be reduced from the initial P/E score. He's now at an 8.00 maximum.", I'd say the judges will simply enter higher initial marks, to reduce the impact.

Why would they do that? -- Why wouldn't they? Their line of reasoning still will be "he's the best skater and everybody knows that, he must win".

Honestly, I don't think there's a solution. The system cheats on itself all the time, because there's a great (but unspoken) need to manipulate the results in accordance with current politics, TPTB interests, money and stuff like that. I mean, did the Canadian choir in London even practice other countries national anthems (let alone the Kazakh one)?

BTW, did you notice that Chan's "amazing" SP "world-record" is based on the marks that include the score from Judge number 9 and that score is outside the legit scores "corridor" BIG TIME? (according to ISU's own definition and calculation instruction).

Anonymous said...

Another suggestion from me is to change their policy about "corridor". The corridor seems to be one of the major causes of both reputation judging (e.g., Highly-ranked skater X receives high PE and IN despite having a complete meltdown and no connection w/ the music on the particular night) and the lack of distinctiveness across the five components scores (e.g., Skater Y receives virtually the same numbers across the five components).
Judges seem to be too worried about the corridor. This is how I suspect why they gave Ms. Mao Asada's lackluster performance higher PE and IN than Ms. Akiko Suzuki's performance of her life at the 2012 NHK trophy. The result seemed to be affected both by #1. reputation judging and by #2. the lack of distinctiveness across the five components. It's easier to automatically give safe numbers, which were high 8's for Ms. Asada and high 7's for Ms. Suzuki, across the five components, than to distinctly assess each PC score according to the performance quality on that particular day. As you said, it may be because the tasks of judging both GOE and PCS might have been too taxing. Yet, I don't really think that it would have been impossible for them to see what seemed so obvious to the audience.

My take is that they actually saw what we saw, but just worried too much about the corridor to judge honestly.

The corridor judging should go.

Anonymous said...

Anonymous 9:16 PM, what you say about the lack of distinctiveness across the five components due to the corridor fear, is a popular belief, AFAIK.

I went through many and many component scores of different skaters to see if it's indeed the corridor that influences the judges decisions. I came to the conclusion that it's not.

Unlike with GOE, with PCS the judges often use for instance 1/5 of the corridor room, and many times even much less than that (I came across some ridiculously low numbers). They could be more distinctive if they wanted to, they'd have stayed within the corridor boundaries, easily. But the judges keep doing that.

I think, this happens because as a judge you want your score to count / to influence the component's trimmed mean. For that, you need to constantly guess what the rest of the panel scores are going to be. Because the highest and the lowest score don't count.

Reading other judges minds with PCS is much harder than it is with GOE. There's only 7 GOE marks, while there's (theoretically speaking) a choice out of 40 available marks for each program component. A .25 can throw your score out of the calculation.

This is where skaters reputations come into play, IMO. The judges study the protocols and learn that a certain skater can't possibly be given a mark below 8.00 no matter how he skates. That knowledge makes the guessing game SO much more easier.

BTW, the judges already figured out a way to cheat on PCS, too.

Anonymous said...

Thanks for sharing your proposals.

Before and around the Vancouver Olympics, negative GOE for 3A was factored by 1.4 and negative GOE for quads was factored by 1.6. (Note: The 1-3 scale was applied to the positive GOE).

The ISU abandoned these scales because they wanted to reduce the perceived risks involved in attempting 3A and quads. They now use -3 to +3 GOE on 3A and quads. This is why we see 7.3 points rewarded to a fallen quad (6.3 after mandatory deduction for a fall).

How would you compare your proposed rule to that previous rule?

What I liked about the previous rule was that they punished not only falls, but also other mistakes. For example, a skater landed a messy quad. He rotated it, but two-footed and stepped out. This may be a little better than a fall, in that he was able to stand up. Nonetheless, it looks only a little better than a fall. Falls may appear the sloppiest and most disastrous mistake, but a lot of other sloppy landings can affect the performance quality. So I'd like to see the range of mistakes reflected in TES.

I basically like the idea of factoring PE by the number of falls. But I also would like to see other messy mistakes reflected in PE. I don't know it's possible to implement this, but just some food for thought.

Pierre-Alain Varisco said...

Pierre-Alain Varisco, Bern, Switzerland
I was an ISU Judge and international judge in two different categories and I was international referee. I do not agree with what has been written. The issue is not the reputation judging, it is definitively not the corridor and it is not a problem of time to evaluate GOE during 4 minutes or more and then to evaluate the components. The problem is just a human problem. What should be important is the function of a judge and not the persons themselves who are sitting down next to ice rink. The 6.0 system and the IJS now have given or are giving too much weight to the persons. So far many persons for their own carrer are ready not to express their own opinions but are ready to try to identify the trends of a competition and to judge on account of what they have identified.
The solution is actually very easy: change the persons, exert more pressure on them and give to young judges the hope to be once international/ISU judges in their lives. Can you imagine, I have been appointed as an international judge for the first with 28. Until the retirement age of 70 for judges, I could have judged 42 years and blocked a swiss seat for this period of time. I stopped judging with 40 and now after me Switzerland has nominated 2 young judges to replace me (as I know both judges had never hoped to become international).
To reduce the influence of the judging persons on the judgement itself, the duration of an international carreer should be 10-15 years (shorter of course if a judge demonstrates his/her inability in judging) and judges should not be older than 40, 45 or 50. During this period of activity, judges would not be interested in a carreer that could last 50 years in the actual system but would be interested in practissing a good job and i assume that those persons would be less sensitive to some other non sportive arguments, because they would not care about it, thus it has no influence on their carreer.
And beside this, you should not change the IJS, which is really a jewel made by the ISU. The users should be changed, but not the system - of course some modifications could or would be necessary, as i can assume it, but nothing should be changed in the fundaments of the IJS.

Tony said...

To the first poster-- many current/recently retired judges seem to think the corridor is a big problem. We see at the end of the comments that Pierre-Alain Varisco replied that he doesn't think so. So I guess it just depends on who you ask.

As far as the P/E debate and mistakes, I really don't think there is a fair way to include all of the mistakes and the severity of them unless you create some wild chart that shows a 2-foot is 1.5% reduced, a step-out is 2.5% reduced, a fall-out is 3.5% reduced, a fall is 5% reduced, etc-- you get the picture. That would be crazy. Also, you have to include the thought that one fall for skater A might have much less of an effect than it does for Skater B, who has decided to sulk through the whole program now.

So, the more I have written about this, the more I think that the two marks just need to stay separate. Of course, if there was a world of separate panels for the GOE and the PCS, then if the PCS judges decided the performance has suffered, then they could mark it down accordingly.

I came up with a new point-earning system that I think would be much more successful. As the fourth person said, they have changed the GOE's so that a negative mark loses less points than in the prior editions of this system, and I think that is the biggest problem (see my latest article about Chan and if he got a -2 on every single jump).

Let's face it, a lot of people want to see PCS scores drop *greatly* after mistakes, but shouldn't we really focus on a total element score that matches the performance before we worry about the other half of the marks? That way, a strong program PCS-wise can stay that way (if it was skated such), but the lack of a TES score will make it hard for that skater to pass someone else who has delivered a great all-around skate.

Varisco has an interesting point. Many of these judges have been around for a long time! Some of them even were on panels in Olympics that took place in the 80's. Obviously a very different sport now.

I agree with your last point, Pierre-Alain. I think at the root that the IJS offers many more positives than negatives. It needs some tweaking (and some judges need additional lessons), but it makes so much more sense than just handing out a 5.9/5.9. Skaters now know what areas they need to work on, they can move up/down the standings wildly between segments, etc.

Pierre-Alain Varisco said...

The corridor is a problem if you consider your own carreer as a judge. If you consider the job that has to be accomplished by a judge, which is to judge and to show with your marks what you thought about the programs, then the corridor is no problem as long as you can explain why your judgement was as you did. If you consider - after having received an assessment - that it is unjustified to receive a punishment and that your opinion can be defended, you can recourse against that assessment. I never receive any assessment and considering what my colleagues were doing by their judgements I made consequent variations between all my components. I was definitively out of the corridor, but i could always justify what i did even if it was completely different than what the others did.
So we are back to the central problem: i always considered myself as a person and as judge completely unimportant, but the function of a judge was the central issue, when i was judging. I tried to respect this function very deeply according to the very important ISU Code of ethics.
(I judged 4 ISU worlds and 3 "junior worlds" in sys).

Anonymous said...

Mr. Varisco,

thank you for telling that.

I was totally planning on going through the protocols of all the events you judged, to see if your scores were indeed outside of the corridor.

But all the protocols I could find online, are from Synchro skating events:

http://www.google.nl/search?hl=en&as_q=Varisco&as_epq=Pierre+Alain&as_oq=&as_eq=&as_nlo=&as_nhi=&lr=&cr=&as_qdr=all&as_sitesearch=isu.org&as_occt=any&safe=images&tbs=&as_filetype=&as_rights=#hl=en&lr=&as_qdr=all&q=Varisco+site:www.isuresults.com&oq=Varisco+site:www.isuresults.com&gs_l=serp.3...160802.160802.2.163384.1.1.0.0.0.0.52.52.1.1.0.ckwqrh..0.0...1.2.7.serp.R4wIWgTI0uk&bav=on.2,or.&bvm=bv.44158598,d.d2k&fp=941398271e63502c&biw=1920&bih=901

I'm not sure if the same corridor deviation limits apply to Synchro skating.

The ones for singles and ice dance are described here (scroll down to page 2):

http://www.isu.org/vsite/vfile/page/fileurl/0,11040,4844-168473-185691-54897-0-file,00.pdf

In order to exceed the limits of the 7.5 points corridor, one needs to be completely 'wild' in their component scoring, IMO.

Assuming the corridor rules are the same for synchro skating, I calculated few possible deviations of some of the extreme component scores in this event:

http://www.isuresults.com/results/wccj09/WCCJ_2009_Junior_SP_Scores.pdf

They are still within the corridor (I didn't go through every single score though).

Pierre-Alain Varisco said...

I am very sorry not to know with whom I am talking to. You know everything about my judgement and my opinions and I do not know anything about you. :-)
If you want to meet me, I am on facebook. Thank you for being as transparent as I am. If you can be, it would be great.

Anonymous said...

Mr. Varisco,

I'm just a reader of this blog (and few other blogs about figure skating). My internet alias is Shush.

I've had a couple years of stats analysis in college. I use my stats skills and a software called spss to analyze ISU protocols, in my free time.

I can't possibly know which scores on that protocol are yours, I just checked those marks that were visibly distinct from the rest, they all are within the corridor limits. I googled your name to find those protocols. It's that simple.

I don't have a FB account, but I invite you (and other (former) ISU officials and judges) to contact me at this email address hsuhs@ gishpuppy.com

at any time, if you have information you would like to share (which I doubt?).

Regards,