The Great Reality Check Part 1: Acute cerebral hemorrhage
Read the results of our new user studies – up-to-date and transparent!
Purpose:
The aim of the study was to prospectively determine the performance of common AI assistants in acute cerebral hemorrhage, validated with the first read reports of radiologists specialized in emergency radiology as well as imaging and clinical follow-up.
Patients, Materials and Methods:
In 2025, 218 patients who had been referred to ERS Emergency Radiology Schueller, a provider of teleradiology services, for cranial CT scans following blunt head trauma were randomly and prospectively enrolled in the study over eight consecutive weeks. CT studies of these patients were randomly evaluated by one of two common, commercially available AI assistants. Radiologists reported the CT studies without the initial knowledge of the AI results and compared the radiological with the AI findings in a second step. Gold standard were the specialists´ reports as well as clinical follow-up. In case of discrepancies between the radiologists´ and the AI assistants´ findings, CT studies were second read within 30 minutes at the latest.
Results:
Of 218 patients, 18 AI results could not be retrieved. Of 200 patients, radiologists and clinical follow-up diagnosed 58 acute intracranial bleedings (.29%). The AI assistants yielded 58 true positive (TP), 0 false positive (FP), 40 false negative (FN), and 82 true negative (TN) results; sensitivity .592; specificity 1.0; positive predictive value (PPV) 1.0; negative predictive value (NPV) .672. No significant difference was found between the results of the AI assistants used. FN findings involved hemorrhages with a width of 5 mm or less (mean 3.5 mm, SE ± 1.9 mm). The minimum extent of a hemorrhage classified as TP by the AI assistants was 5 mm (range 5–15 mm; mean, 9 mm; SE ± 7 mm).
Discussion:
The AI assistants correctly identified all acute cerebral hemorrhages. The absence of FP results suggests that typical pitfalls, such as hardening artifacts, bone margins, and calcifications along the intern table, have been addressed by the software companies. However, the surprisingly high FN rate suggests that AI assistants are currently non suitable for triaging patients with traumatic brain injury in the high-end setting of professional teleradiology. The high FN rate, particularly for smaller hemorrhages, also casts doubt on their use as a second look in the hectic daily routine of acute radiology. Our data are not comparable to the official figures provided by AI manufacturers, who published sensitivity and specificity figures of at least 90%. Certainly, the relatively small sample size of our study contributes to this discrepancy as a limitation. Furthermore, future studies should test a larger number of AI assistants.

