Diagnostic performance comparable for two test sets from the Personal Performance in Mammographic Screening scheme.
Diagnostic performance of an artificial intelligence (AI) algorithm is comparable to human reader performance for interpreting mammographic screening, according to a study published online Sept. 5 in Radiology.
Yan Chen, Ph.D., from the University of Nottingham in the United Kingdom, and colleagues compared the performance of human readers and a commercially available AI algorithm for interpreting the Personal Performance in Mammographic Screening (PERFORMS) scheme test sets. Two test sets, each including 60 challenging cases, were assessed by human readers and an AI algorithm. Performance was assessed using the highest score for each breast; metrics include sensitivity, specificity, and area under the receiver operating characteristic curve (AUC).
A total of 552 human readers interpreted the tests sets, which included 161, 70, and nine normal, malignant, and benign breasts, respectively. The researchers found no difference between the AUC for AI and human readers (0.93 and 0.88 percent, respectively; P = 0.15). No difference was seen for AI versus human readers in sensitivity (84 versus 90 percent; P = 0.34) when using the developer’s suggested recall score threshold, but AI had higher specificity than human readers (89 versus 76 percent; P = 0.003). Due to the size of the test sets, equivalency could not be demonstrated. AI showed no differences in performance when using recall thresholds to match human reader performance, with sensitivity and specificity of 91 and 77 percent (P = 0.73 and 0.85, respectively).
“The results of this study provide strong supporting evidence that AI for breast cancer screening can perform as well as human readers,” Chen said in a statement.