Abstract
The purpose of this study was to review 17 articles published between January 2023 and November 2023 that dealt with the performance of AI detectors in differentiating between AI-generated and human-written texts. Employing a slightly modified version of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) protocol and an aggregated set of quality evaluation criteria adapted from A MeaSurement Tool to Assess systematic Reviews (AMSTAR) tool, the study was conducted from 1 October 2023 to 30 November 2023 and guided by six research questions. The study conducted its searches on eleven online databases, two Internet search engines, and one academic social networking site. The geolocation and authorship of the 17 reviewed articles were spread across twelve countries in both the Global North and the Global South. ChatGPT (in its two versions, GPT-3.5 and GPT-4) was the sole AI text generator used or was one of the AI text generators in instances where more than one AI text generator had been used. Crossplag was the top-performing AI detection tool, followed by Copyleaks. Duplichecker and Writer were the worst-performing AI detection tools in instances in which they had been used. One of the major aspects flagged by the main findings of the 17 reviewed articles is the inconsistency of the detection efficacy of all the tested AI detectors and all the tested anti-plagiarism detection tools. Both sets of detection tools were found to lack detection reliability. As a result, this study recommends utilising both contemporary AI detectors and traditional anti-plagiarism detection tools, together with human reviewers/raters, in an ongoing search for differentiating between AI-generated and human-written texts.