Music Weekly: Judging the Jury
Judging the Jury
The 2023 Karol Szymanowski International Music Competition in Poland was unusually free of controversy: no scandals, no leaked jury fights, no angry editorials—just happy prizewinners, proud sponsors, and eager concert promoters. According to Jury Chairman Kevin Kenner, “the only complaints came from the press, disappointed that there was no scandal to report, and from members of the jury, who for the first time felt that they were the ones being judged.”
Why would the jury feel this way?
As an international concert pianist and seasoned juror at competitions including the Chopin where he won top prize in 1980, Kenner has witnessed more than enough adjudicating chaos. Partnering with economist Krzysztof Kontek, he developed a scoring system to neutralize the effects of extreme voting—whether born of conviction or collusion. The 2023 Szymanowski Competition adopted this system, and in several cases, jurors whose scores consistently deviated from the average were removed entirely from the final calculation. The result: fairer outcomes—and perhaps a few bruised egos.
How is this algorithm different from the standard trimmed mean? Are there better methods? And is all this math comprehensible to a musician’s mind? To answer these questions, we turn to social-choice theory, a branch of economics devoted to how groups make collective decisions—in a competition, an election, or even a sports event.
In fact, it was the 2002 Olympic pairs figure-skating scandal that brought social-choice theory into public view. When the Canadian duo was controversially awarded silver behind the Russians, public outrage forced an IOC investigation that uncovered vote-trading between the French and Russian judges. The result: both pairs received gold, two judges were suspended, and the entire Olympic scoring system—including gymnastics and diving—was reformed.
With global attention and money at stake, sport has learned to take scoring integrity seriously. By contrast, classical music, as sociologist Stacy Lom notes, operates with smaller audiences, weaker oversight, and virtually no appeals process—yet the need for fairness remains just as urgent: not only to protect the sanctity of art, but to ensure that genuine young talents can still be discovered.
Unfortunately, research by music psychologist Maria Manturzewska shows that “individual ratings of musical performance are not a reliable measure of achievement, even when given by experts of the highest level.” Bias seeps in from every direction: style, generation, gender, nationality, affiliation, reputation, sequence, fatigue—and, in worst cases, politics or self-interest. As cellist Julian Lloyd Webber once said of the Tchaikovsky Competition, “Get the right teacher or there’s no point entering… everyone knows classical music competitions are rife with corruption and bribery, but no one says it because when you’re in the profession, you don’t.”
Despite their flaws, competitions remain the most visible and accessible platform for young artists to gain recognition and take the first steps in their professional lives. With so much at stake, the mechanism that determines who receives these opportunities—and who can turn them into a lifelong career—should be at its best. Let us first examine the main systems—and how they fail.
1. Mean (Arithmetic Average)
The simplest and most common method: each juror gives a score, and the total is divided by the number of jurors. Easy—but dangerously sensitive to outliers. One extreme score can tilt the result completely. It depends, perilously, on juror competence and integrity. Good luck with that.
2. Trimmed Mean (Dropping Extremes)
To dampen extremes, the highest and lowest scores are dropped before averaging. Stability improves—but so does mundanity. A visionary juror’s high score disappears along with a malicious lowball. After trimming, even Glenn Gould would score in the middle. Worse, when the majority lacks discernment, trimming deletes unique artistry. Some systems try to solve this by ranking rather than scoring.
3. Borda Count (Ranking System)
Jurors rank contestants—first place earns one point, second two, and so on. The lowest total wins. The ranking system is essentially the Mean without nuance, and when no clear standout exists, this system often rewards the contestant with the most 2s and 3s—and, like the mean, still places too much faith in the professionalism of every juror.
At the finals of the 2016 Wieniawski Violin Competition, this vulnerability became visible in the data: two distinct voting blocs emerged, roughly aligned with Zakhar Bron and Maxim Vengerov, each ranking its favored candidate first and the rival nearly last. The published jury sheets made the split undeniable, and later statistical analysis by economists Honorata Sosnowska and Krzysztof Kontek formally confirmed it. Ironically, it was the “second-prize” laureate, Bomsori Kim, who went on to international stardom—while the episode itself became the case study that inspired Kenner and Kontek’s later reforms.
4. Yes/No (Approval Voting)
Common in preliminary rounds, the yes-or-no vote is quick, decisive—and utterly flattening. A weak “yes” counts the same as a strong one, and a single hesitant “no” can send a contestant home early. According to later accounts by music critic Janusz Ekiert, it was precisely such a binary vote that eliminated Ivo Pogorelich from the 1980 Chopin Competition—an outcome that prompted Martha Argerich to resign in protest and create a legend.
5. Normalized Scores
To counteract the brute simplicity of the previous methods, we can adjust all juror scores to a common scale; this is called normalization. For example, the Z-score method rescales results by standard deviation, equalizing all scores to the range of –1 to 1 – basically resizing every juror’s ruler to the same length. But consider Juror A, who scores everyone between 21 and 25 (“all good”), versus Juror B, who ranges from 5 to 20 (“bad to average”). After normalization, A’s 21 and B’s 5 both become –1; A’s 25 and B’s 20 both become 1. Their nuanced judgments vanish.
The C-Mean, used in the 2025 Chopin Competition, goes further: it adjusts each juror’s scores closer to the overall mean before recalculating. This can disastrously flip results. If Juror C gives 18 and 16 to two contestants whose overall averages are 15 and 21, applying the Chopin Competition’s ±2 adjustment normalizes all scores to {15±2 = 13–17} and {21±2 = 19–23}. C’s scores are then adjusted to these ranges, 18 down to 17 and 16 up to 19—reversing Deng’s original preference! In fact, in the C-Mean system, it’s entirely possible for the contestant with the most first place scores to not receive first place!
The Inescapable Dilemma
Every scoring formula, elegant or brute, translates human judgment to numbers, and inevitably introduces distortion, reflected by the essence of Arrow’s Impossibility Theorem—the cornerstone of social-choice theory: no method can satisfy every criterion of fairness simultaneously. The question, then, isn’t which system is flawless, but which flaws we can accept. With that in mind, here are four principles that could help competitions strike a better balance between art and arithmetic.
1. Normalize
Every juror uses a different scale. While the Z-score feels soulless, a refined version can preserve each juror’s artistic judgment. For the 2025 U.S. National Chopin Competition, Kenner and Kontek introduced Juror Score Transposition (JST), which normalizes scales while maintaining intention and ranking—a nuanced form of standardization.
2. Neutralize
Outlier scores—suspiciously high or low—must not dictate the outcome. When a juror’s marks drift beyond statistical reason, they should be flagged and excluded. As shown in the 2023 Szymanowski Competition, the mere possibility of exclusion can restore integrity to the process.
3. Season
Not all disagreement is nefarious. Great artists often divide juries. A modest “controversy bonus,” based on standard deviation, could reward candidates whose performances provoke strong but polarized reactions—capped so as not to surpass consistently high scorers. This honors artistic risk and prevents normalization from devolving into bland consensus. Competitions should discover flavor, not eradicate it.
4. Publish
Make everything public: raw scores, normalized scores, removed outliers, and bonuses. Sociologist Susan McLaren, who studied judging scandals in sport, found that jurors and audiences can accept imperfection—even controversy—if one condition is met: transparency. Clear rules, impartial juries, and a process for oversight and appeal are crucial for faith. Just like a legal system, when people understand how a decision was made, they can respect it, even if they disagree.
Go further—apply advanced statistical models such as the Many-Facet Rasch Model, often used in education and psychology to detect hidden patterns, or in music’s case, jury collusion and bias—or Bayesian analysis to track a juror’s behavior across competitions, exposing deviations from their own norms. These methods act as forensic detectives, revealing secrets in real-time almost like magic and giving us answers today instead of through interviews years later.
Consider the 1994 Tchaikovsky Piano Competition: second prize to a young Nikolai Lugansky, no first prize awarded. As pianist Vera Gornostaeva later revealed, “Lugansky’s artistry was beyond doubt. But the jury was still fighting ghosts—ghosts of influence, envy, and school.” They were punishing not him, but his late teacher Tatiana Nikolaeva. Thirty years on, time has proven Lugansky one of his generation’s great artists—which today raises the question: does Lugansky need the Tchaikovsky Competition, or does the competition need Lugansky?
In an age when young artists look to competitions for validation, perhaps it is the competitions that need to validate themselves—and rise to the level of the competitors. For the future of music, we should do better—and we can.
So, who’s stopping us?
References
Arrow, K. J. (1951). Social choice and individual values. Yale University Press.
Ekiert, J. (1980, October). Chopinowski konkurs 1980: Pogorelich i Argerich [The 1980 Chopin Competition: Pogorelich and Argerich]. Polityka.
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2020). Bayesian data analysis (4th ed.). CRC Press.
Gornostaeva, V. (1994). Interview in Muzykalnaya zhizn’ [Musical Life], issue on the Tchaikovsky International Piano Competition. Moscow: Union of Composers of Russia.
Kenner, K., & Kontek, K. (2023). Statistical methods for ensuring fairness in music competitions: The Szymanowski model. Presentation at the Karol Szymanowski International Music Competition, Katowice, Poland.
Kontek, K., & Kenner, K. (2025). Identifying outlier scores and outlier jurors to reduce manipulation in classical music competitions. Journal of Cultural Economics, 49(1), 49–98. https://doi.org/10.1007/s10824-023-09494-7
Linacre, J. M. (1989). Many-facet Rasch measurement. MESA Press.
Lloyd Webber, J. (2013, October 14). Classical music competitions are rife with corruption. The Telegraph. https://www.telegraph.co.uk/
Lom, S. E. (2024). Judging by the rules? The emergence of evaluation practices. Valuation Studies, 11(1), 91–137. https://doi.org/10.3384/VS.2001-5992.2024.11.1.91-137
Manturzewska, M. (1990). Psychological determinants of the artistic development of musicians. In M. Manturzewska & A. Miklaszewski (Eds.), Psychology of music: Studies and research (pp. 45–63). PWN.
Manturzewska, M. (2011). The reliability of evaluation of musical performance by music experts. Interdisciplinary Studies in Musicology, 10, 98–110. https://bibliotekanauki.pl/articles/780235.pdf
McLaren, S. (2016). Scandals and systems: Transparency in sport adjudication. International Journal of Sport Policy and Politics, 8(2), 233–248. https://doi.org/10.1080/19406940.2015.1123754
Myford, C. M., & Mislevy, R. J. (1995). Monitoring and improving a portfolio assessment system. Educational Assessment, 3(1), 41–85. https://eric.ed.gov/?id=ED388725
Sosnowska, H., & Kontek, K. (2017). Statistical analysis of jury voting in the 2016 Wieniawski Violin Competition. Ruch Muzyczny, 61(11), 12–18.