AI transcription for Australian solicitors: a practical guide
What 'evidence-grade' actually means for AI transcripts in NSW, Victoria and Queensland courts — and the small workflow changes that make Whisper-class tools admissible without a fight.
· legaltranscriptionaustraliaevidencecompliance
A solicitor I spoke with last winter — small commercial firm in North Sydney, about forty fee earners — runs a quiet experiment. Half her file notes from client conferences are dictated, sent to a Filipino typing service, and come back in 6–8 hours. The other half she records on her phone and drops into an AI transcription tool over lunch. She gets the AI transcript in roughly two minutes.
She told me she keeps both running because she's nervous about admissibility. She has a hearing booked for May. She doesn't yet know which version she'll attach to her brief.
That hesitation is doing the rounds across Australian firms. Whisper-class transcription is fast and cheap, and it's now usually accurate enough to read straight through. The remaining doubt isn't really about the accuracy any more — it's about evidence rules, client confidentiality, and what happens when the other side's barrister asks where the audio file lives.
This piece walks through the parts that matter: what the Evidence Act 1995 (Cth) actually says about computer-produced documents, how the state Bar associations differ on retention, the workflow choices that keep AI transcripts admissible without argument, and the bits that still need a human in the loop.
What 'AI transcription' actually does
Speech-to-text models like OpenAI's Whisper-large-v3 work by chunking audio into 30-second windows, converting each window into a numerical "spectrogram" representation, and then predicting the most likely sequence of word tokens. They don't transcribe phonemes the way a human transcriber does. They guess sentences.
That distinction matters in court. A human transcriber who hears "the defendant said he didn't know about the contract" types those exact words. An AI model produces what its training distribution says is the most probable English sentence given the audio waveform. Most of the time those two outputs match. When they don't, the AI version tends to be more grammatically polished and slightly more confident than the recording deserved.
For routine matters — file notes, conference summaries, prep transcripts — this is fine. For evidentiary tendering you want one more step.
The Evidence Act answer: s 48, s 69, and the s 79 expert evidence escape hatch
Under s 48 of the Evidence Act, a party may adduce the contents of a document by tendering the document itself, or by tendering a copy "produced … by a device that reproduces the contents of documents". Section 146 then gives a presumption: if a document is produced by a device that, if used properly, ordinarily produces a particular outcome, the court may presume the device worked.
In practice, audio recordings are tendered routinely under this framework. The AI transcript is the harder question because it's a new document derived from the audio. The cleanest path is to:
- Tender the audio as the primary exhibit.
- Tender the AI transcript as an aide-mémoire — explicitly not as the source — and offer the transcribing party as a witness to its production.
- Keep the raw transcript and the corrected transcript both, with timestamps.
Step 3 is the one most firms get wrong. If you give the court a polished transcript without preserving the model's first-pass output, you've quietly destroyed the chain of custody for the most basic admissibility argument: that this transcript represents the audio.
s 69 (business records) becomes available when the transcript is created in the ordinary course of business and the person who produced it is unavailable. For commercial matters where the recording was made by a paralegal and the brief is being prepared months later, this is often the cleaner pathway than s 146.
If accuracy is challenged, s 79 lets you call an expert to give evidence about how the transcription system works. We've had two firms ask us for written statements on Whisper's expected word error rate on Australian English — we'll always provide one for active matters. It's not unusual; it's exactly the same thing the printers of an exhibit photograph would do if asked.
State-by-state variations that catch people out
The Commonwealth Evidence Act applies in federal court and ACT. NSW and Victoria have closely-mirrored state Acts. Queensland, Western Australia, South Australia and Tasmania each have their own framework with subtle differences.
The two practical differences worth knowing:
Queensland — under the Evidence Act 1977 (Qld) s 95, computer-produced documents have a more prescriptive admissibility test. The party tendering must satisfy the court that the computer was operating properly, that the relevant information was supplied to the computer in the ordinary course of activities, and that the output reproduces the relevant facts. For AI transcription that's a slightly higher bar because the "ordinary course of activities" language assumes deterministic processing. A short affidavit from the transcribing solicitor or paralegal usually covers it.
Western Australia — the Evidence Act 1906 (WA) doesn't have a direct equivalent to s 146 of the Cth Act. Computer-produced output is treated under common-law rules of admissibility, which historically meant a fight over hearsay objections. The Perth firms we work with handle this by always recording the original audio in addition to the transcript, and tendering both.
NSW and Victoria are the easiest jurisdictions because the Evidence Acts are uniform with the Commonwealth.
Client privilege and where the audio lives
The bigger compliance question in 2025 isn't admissibility — it's where the audio goes during transcription. If you upload a client conference to a US-based AI service, you've potentially:
- Transferred client information across borders (APP 8)
- Exposed it to US government access under the CLOUD Act
- Put your firm in breach of any data-handling clauses in your retainer
The NSW Bar's Recommendation on cloud-based legal technology (2024 revision) is explicit: solicitors should "take reasonable steps to ensure that information stored or processed in the cloud remains subject to Australian or equivalent privacy law". Practical effect: use a service whose servers are in Australia, or whose data-handling policy is materially equivalent to the APPs.
This is the reason [speechtotext.au](/) exists in the first place. Audio is processed in-memory in Australia and discarded immediately — there's no on-disk storage of the original recording, and the transcript itself is only persisted if the user is signed in. For unsigned use (one-off file drops), the transcript is returned inline and never saved anywhere.
If you're using a different tool, the questions to ask your vendor:
- Where is the audio file written to disk during processing?
- How long is it retained?
- Is it used to train future versions of the model?
- What's the data-residency setting and can it be locked to Australia?
A "no" or "we can't tell you" on any of these is a flag.
A workflow that survives cross-examination
The Sydney solicitor I mentioned at the start eventually settled on this pattern. We've heard variations of it from a Brisbane family law firm and a small commercial practice in Adelaide:
Capture stage
- Record on a dedicated device (a $80 Sony ICD-PX is fine; phones are fine; Zoom recordings are fine if you toggle local recording on)
- Name the file with matter number + date + speakers, e.g.
M-2025-0421_Smith_v_Jones_conference_18Apr.m4a - Hash the file (SHA-256) and log the hash in the matter file the same day
Transcription stage
- Upload to a transcription service whose data-residency you trust
- Save the raw model output verbatim — don't let anyone "tidy it up" yet
- File the raw output alongside the audio
Review stage
- Have a paralegal compare the raw transcript to the audio at 1.5× speed
- Annotate corrections in a separate revision (not by overwriting)
- Initial and date the corrections page
Brief preparation
- Tender the audio as the primary exhibit
- Attach the corrected transcript as a courtesy aide-mémoire
- Have the corrections page available if the opposing side requests it
This is the same chain-of-custody discipline you'd apply to any documentary exhibit. The AI bit doesn't change the principles; it just makes the first draft instant.
What still needs a human
Some material AI handles well — straightforward dialogue, clear Australian English, named places. Some it doesn't. The honest list:
- Multiple overlapping speakers. Whisper isn't a diarisation system; you'll get a single column of text with [crosstalk] markers if you're lucky. For witness conferences with three or more participants, hire a court-reporting service.
- Strong non-English accents or code-switching. If your client conferences move between English and Cantonese or Arabic, you'll lose chunks. We can route those segments through the multilingual model but it's not perfect.
- Heavy technical jargon outside common medical/legal terms. AI is good with "Pty Ltd", "ex parte", "ratio decidendi", "Subdivision 35-B of the Income Tax Assessment Act 1997". It's less good with hyper-specific pharmaceutical chemistry or specialist engineering vocabulary. Either way, a human should always review.
- Anything you'd swear an affidavit on. Affidavits are first-person and precise. Use AI for the file note that supports the affidavit, not for the affidavit itself.
Cost, time, and the small-firm calculus
Outsourced typing in Australia runs about $1.80–$2.50 per audio minute through specialist legal services. A 60-minute conference is roughly $120 by the time you've tipped, plus an 8-hour wait. AI transcription on speechtotext.au is included in the Pro plan ($19/month, 600 minutes) — that's $0.03 per minute amortised, and the result is back in roughly 90 seconds.
The small-firm partners who've moved to AI-first generally do so because the speed changes how often they record. When file notes are free, you record everything. When they're $100 each, you only record the conferences that feel important — and you lose the second category, which is often where the actual evidence sits later.
We've put together a legal-specific page with the privacy posture and export formats firms tend to ask about. If your matter management system is LEAP, Smokeball, FilePro, or Actionstep, the PDF export drops in without conversion.
FAQs
Is AI transcription admissible in NSW court?
Yes, under the same framework that admits any computer-produced document. The audio is the primary exhibit; the transcript is an aide-mémoire. See s 48 and s 146 of the Evidence Act 1995 (NSW).
Does Whisper handle Australian accents well?
Whisper-large-v3 (the model speechtotext.au uses) is trained on hundreds of thousands of hours of multilingual audio, with a sizeable Australian English subset. Broad and cultivated accents both work well; we publish a benchmark on regional accent accuracy in a separate piece.
Is client privilege preserved if I use cloud transcription?
Only if the cloud service handles your audio in accordance with Australian Privacy Principles. The two things to verify: data residency and retention. Services whose audio touches US servers may expose client data to CLOUD Act requests.
How do I challenge an AI transcript the other side tendered?
The usual route is a s 79 expert challenge — call evidence about the model's known error patterns and request the raw (uncorrected) transcript. The other side should be able to produce it; if they can't, it's a chain-of-custody point in your favour.
Can I use AI transcription for court hearings?
Court transcripts proper are produced by accredited reporters and are the official record. AI transcription is fine for your own working transcript of a hearing (e.g. for an interlocutory review the next morning), but it's not the court's transcript.
---
If you're a solicitor evaluating AI transcription for your practice, start a Pro trial or [email us](mailto:hello@speechtotext.au) — we'll happily walk through the data-handling agreement before you commit.