Turnitin’s AI writing detection tool

Turnitin’s artificial intelligence (AI) writing detection tool is accessible to staff within Turnitin’s similarity report function. The tool works on text produced in English and includes:

A writing detection indicator (percentage of text predicted to be AI-generated)
A report (highlighted text in the submission that maps to this percentage)

Due to system constraints, neither the AI writing detection indicator (percentage) nor the report (highlighted text) are visible to students.

There has been quite a bit of debate and conjecture about the validity and error rates of AI detection tools, particularly potential injustices flowing from false positive results. This discussion has often been unrelated to the Turnitin AI detector, focusing instead on freely available online tools known to be poor quality. Concerns towards false positives are also often expressed with the erroneous assumption that high scores automatically lead to allegations or findings of misconduct. The University continues to monitor emerging research into AI detection and the validity of the Turnitin tool and has processes in place to mitigate the risks posed by false positives.

How the Turnitin artificial intelligence writing detection tool works

The detection tool looks for highly predictable language patterns to identify passages potentially generated by AI. It has been trained on a representative sample of data that includes both AI generated text and authentic academic writing from the Turnitin archive across geographies and subject areas spanning roughly 2 decades. This differentiates it from other AI writing detectors, such as ZeroGPT, as no other company has such a large quantity of student-authored non-AI generated text. To minimise potential bias, care was taken during the dataset construction to represent statistically underrepresented groups like second language learners, English users from non-English speaking countries, students from universities with diverse enrolments and less common subject areas. Since the tool’s release in April 2023, it has been increasingly ‘field tested’, leading to a growing understanding of its uses and limitations.

Validity testing by Turnitin

Turnitin released a white paper (staff only) in August 2023 detailing the confidence rate of their AI detection tool. They tested the system using two metrics; the rate in which the system correctly identified AI authored work, known as Recall, and the rate at which the system incorrectly identified human authored content as AI, known as the false positive rate (FPR). A document is considered AI written where the detector score is higher than 20%.

Turnitin tested Recall using a dataset of approximately 7,000 documents with a mix of human and AI authored material. Turnitin measured Recall at 84.2% at the document level, which is to say that if 1000 documents were written by AI, 842 of them would be correctly flagged as such by the detector. Recall at the sentence level has been measured as 92.3%. FPR has been tested using 800,000 papers submitted before 2019, which are assumed to be human written. FPR was measured at 0.7% at the document level. This is to say that of 1000 human authored documents, 7 would be incorrectly identified as AI written. Sentence level FPR has been measured as 0.2%.

Testing has found that validity is impacted by the volume of text in work that contains mixed AI and human generated materials. Accuracy increases with volume, so passages on which the tool is used should be no shorter than 300 words.

Monitoring by the University

The University continues to monitor emerging research, outputs and cases from Turnitin’s AI detection tool, through direct correspondence with Turnitin and peer reviewed papers. Turnitin has recognized that they need to increase the breadth of generative AI paraphrasing and AI rewriting tools that can be detected. The University has prepared separate advice on the use of AI-based editorial tools.

Using and interpreting results from the AI writing detection tool

While there are significant benefits in the University using Turnitin AI writing detection tool, its limitations should be considered when using the tool and interpreting and acting on its reports.

Staff should:

Only use the tool for pieces of writing that exceed 300 words in length
Only focus on work where more than 20% of the text is predicted to be AI-generated
Use the tool with an entire piece of written work, rather than relying on the tools’ sentence-by-sentence determination of sources
Be mindful of whether the student work being submitted is for an assessment task, or in a discipline that tends to require formulaic expression, as this might increase the likelihood of a false positive
Consider whether the student’s style of written expression tends to be regular, routine or formulaic as this might increase the likelihood of a false positive.

How to proceed if you see a high AI detection score

As with Turnitin’s Similarity Report, a high AI detection score is not proof that academic misconduct has taken place.

Staff should see the AI detection tool as just one of a number of ways to determine whether students’ have used AI inappropriately in the preparation of assessment material (see below).

A high AI detection score alone does not constitute grounds for making an allegation of academic misconduct. Staff need to seek further evidence from other sources before an allegation of academic misconduct can be made with students.

These sources of evidence could include:

False references, facts and other types of AI-generated ‘hallucinations’
A considerable difference in the language used and/or the specific knowledge and ability in the subject area shown in the student’s submitted assessment compared to the student’s previous work in the subject
The use of language in the student’s submitted assessment that is not typical of the subject, course or award level
Inconsistency in the style of presentation shown across in the student’s submitted assessment, potentially indicating that the work has been compiled using multiple ‘authors’
Inclusion in the student’s submitted assessment of warnings or other caveats typically inserted by AI, particularly the prompt ‘regenerate text’
Unusual contents in the file metadata.

In gathering further sources of evidence staff are able to speak to students, but this should be done with care, and with reference to your faculty’s preferred process. In any preliminary discussions with students it is important that staff do not make any allegations of academic misconduct.

As per current policy, students have been advised that staff may ask to speak with them about how they have completed assessment tasks, and to explain the content of material they have prepared. Staff may ask students to share drafts, notes or other evidence of authorship and discuss with them their understanding of the topics that are covered in the subject and its assessment.

Making an allegation

The process to make an allegation differs in each faculty – if unclear on the process in your faculty, please seek further advice.

In line with the University’s statement on the use of artificial intelligence software in the preparation of materials for assessment any allegation must be based on evidence surfaced through the investigation process. To reiterate, the results produced by the AI writing detection tool cannot be the only piece of evidence relied upon in an academic misconduct allegation.