The Epica Awards just unveiled the "AIJE" (Artificial Intelligence Jury Experiment), its new innovative exploration made to assess the potential of AI in understanding and evaluating creative concepts.
The experiment, separate from the main awards judged by a panel of 150 human journalists, employed a tedious AI evaluation process.
In its initial version, the AI relied solely on textual descriptions provided by entrants, focusing on shortlisted entries with easily explainable concepts. Entrants were given a standardization tool to distill their creative ideas into concise descriptions suitable for AI processing.
To maintain consistency with human jurors, the AI was fed with descriptions bundled by category, along with a prompt containing category details and the Epica Awards' scoring scale ranging from 1 (Damaging) to 10 (World Beating).
The GPT-4 Turbo API generated scores and text justifications for each entry in 80 runs, with the final results averaged using the interquartile range method to eliminate outliers.
How Did the AIJE Fair?
The AIJE experiment revealed a correlation coefficient of approximately 0.25 with human voting patterns.
Notably, the AI awarded higher scores, averaging 7.45, compared to the human jurors' average of 6.60. This highlighted a fundamental difference in evaluation approaches between the two, with journalists exhibiting a tougher scoring tendency.
The AI's higher scores suggested a tendency to be more easily impressed, emphasizing its impartiality compared to human biases.
Nicolas Huvé, the Epica Awards Operation Director and creator of AIJE, acknowledged the promising correlation observed in initial tests but noted a noticeable discrepancy during the live experiment.
"Though not surprising, all these entries were already deemed high quality by a human jury," he added.
While journalists demonstrated the ability to identify originality, AIJE showcased efficiency in evaluating campaigns strictly within their category scope.
Huvé highlighted the potential of AIJE in future iterations, revealing plans to include more categories and visuals.
"Journalists, known for their critical analysis, are generally tougher in their scoring. In contrast, AIJE tended to be more easily impressed. In the jury room, journalists could identify ideas that had been done before in some way, whereas AIJE perceived novelty," Huvé noted.
Huvé also expressed his preference for a general AI approach to avoid the "feedback loop" prevalent in the creative industry and went on to say that unlike human jurors, AIJE was "not influenced by such human biases" based on work they prefer or dislike.
The experiment yields valuable insights into the potential role of AI in evaluating creativity.
Future versions of AIJE will expand to cover additional categories and incorporate visual elements. In the 2024 Epica Awards, participants will automatically qualify for involvement in the next iteration of AIJE.