Requirements and standards for video and audio

Users need transcripts, captions and audio descriptions for accessibility. Work with experts to record and mix audio. Check branding, copyright and permissions requirements.

Make sure everyone can access audio and video

Audio-visual content must have:

  • a transcript for people who don’t want to watch the video or listen to the audio – and for search engine indexing
  • closed captions for people who can’t hear the dialogue and other sounds (symbol [CC])
  • an audio description for people who can’t see the video.

Audio description is narration. It gives important details that are only visual, usually during gaps in the dialogue.

Choose an accessible media player that:

  • can be operated with a keyboard and mouse
  • can display closed captions
  • adapts resolutions to support low bandwidth connections
  • works in landscape and portrait mode.

YouTube, Facebook, Vimeo and many other common video platforms often have this functionality.

Don’t set video or animation to autoplay. Do not include anything that flashes more than 3 times in any one second period.

Make sure the colour contrast of text in videos is at least 4.5:1.

Normalise your audio to make sure each recording has the same volume.

Remove background sounds or make sure they are at least 20 decibels lower than the foreground speech content.

Accessibility requirements

User needs:

  • I can access equivalent information to anything contained in a video or audio file.
  • I can easily control how I see and hear distinctions if colour or sound convey meaning.
  • I can operate the content using only a keyboard.
  • I can use content without experiencing a seizure or physical reaction.
  • I can access equivalent information to anything contained in a video or audio file.

Web Content Accessibility Guidelines success criteria:

Include a transcript for audio-visual and audio-only

Time-based media (video and audio) must have a transcript so they are accessible to everyone. Transcripts are also important for search engine indexing.

Accessibility requirements

User needs:

I can access equivalent information to anything contained in a video or audio file.

WCAG quick reference:

Have a person transcribe the text. There are many professional services that offer transcription. If you use automated transcribing software, review it for errors and correct them.

  • Format the transcript in plain text or simple HTML.
  • Consider breaking the transcript into headings and adding time markers. This means that people can easily find and play a part of the audio.
  • You can place the transcript on the same page as the content. Alternatively, you can also add a link below or beside the content to a page with the transcript.

Include all dialogue and the speakers’ names. If there is only one speaker, introduce them by name in the transcript. You can then leave out their name with the dialogue, unless it is crucial to the content.

Write non-verbal information in square brackets.


‘… to get to the other side [laughing].’

End the transcript with ‘End of transcript’.

Include subtitles or captions for video

Subtitles and captions are not the same:

  • Subtitles display only the dialogue and are commonly used for translations.
  • Captions display any audio in a video, including dialogue and sound cues.

Captions can be open captions or closed captions.

W3C’s Making audio and video media accessible has more instructions at ‘Captions/subtitles’.

Accessibility requirements

User needs:

I can access equivalent information to anything contained in a video or audio file.

Web Content Accessibility Guidelines success criteria:

Open captions, which the user cannot turn off, are burned onto the video itself. Closed captions can be turned on or off by the user.

If you have a choice, closed captions are preferable to subtitles or open captions. Some media players also allow for multilingual captions that users can select on demand if they need translations.

Use Australian terms and spelling for an Australian audience, regardless of the accent or intention of the speaker. Government agencies might use alternative spellings in video content for some overseas audiences.

Closed captions come in many file formats. SRT is a common format and is compatible with most online platforms and media players.

Load the closed caption file onto the video platform as a separate file to the video file. This allows the user to turn the captions on or off.

You can edit closed caption files.

Time the captions and subtitles to match the dialogue

Keep captions and subtitles short: no more than 2 lines, with no more than 42 characters per line. Make line breaks where there is a natural break in the sentence (for example, after a comma).

People read about 2 lines of text in 2 seconds. Test that users can read the captions in the time available.

Don’t leave the captions on the screen:

  • after the speech has finished
  • across the transition from one scene to the next.

Try to ensure a small break of at least 4 frames between subtitles.

Review and edit the accuracy and synchronised timing of the captions for pre recorded video.

Centre subtitles and captions at the bottom of the screen

If there are 2 speakers:

  • give each speaker a separate line and start each line with a dash
  • align each line to the left of the subtitle block so that their left edges stack above each other
  • centre the whole subtitle or caption block.

Don’t use a full stop at the end of subtitles. Use an ellipsis to indicate the end of a sentence where speech trails off or where a sentence remains unfinished.

Use italics for:

  • foreign words and phrases
  • dialogue that is not on the screen, such as a public announcement or dialogue from a television program.

A hash symbol (#) or a quaver symbol (♪) show song lyrics.

Add audio description for video

Audio description tells people what is happening in the visual elements of the video. It could be the only way someone who is blind or has low vision will know visual cues such as the:

  • setting
  • costumes
  • displayed text (such as a sign)
  • movements.

The audio description describes these visual cues to low-vision users during gaps in the dialogue.

W3C’s Making audio and video media accessible details this aspect of accessibility in ‘Audio description of visual information’.

Accessibility requirements

User needs:

I can access equivalent information to anything contained in a video or audio file.

Web Content Accessibility Guidelines success criteria:

You don’t need an audio description for dialogue delivered straight to camera (‘talking heads’).

For anything else, you need to identify who is speaking in the description. If location is important, include that too.

Use audio description to give context if it is not obvious from the title.

Ensure that:

  • the content is accurate
  • the describer speaks clearly.

Include a title, description and metadata

Any video or audio content you publish must have:

  • a title, description and metadata
  • an appropriate file name.

The title and description tell people what the content is about. Make sure the title:

  • has fewer than 70 characters so that search engines can find it
  • includes keywords
  • is descriptive, telling users what the content is about.

The description puts the content in context wherever you’re publishing it. The description helps users know what they’ll get out of the content so they can decide whether to view it.

Use the description to explain if the content is part of a series and tell people how they can subscribe. You can also add:

  • branding
  • a publication date
  • links to related webpages.

Each file format has different metadata fields. Metadata tags store specific information about the file, like the:

  • creation date
  • creator
  • owner
  • talent
  • file size
  • file name.

Some fill automatically, such as the creation date. You’ll have to fill other fields manually. Search engines use information from metadata to include files in searches.

Information management requirements

You must use the Protective Security Policy Framework (PSPF) when preparing government information. The PSPF covers how information is classified and marked, and what this means for its storage, handling, access and disposal. Consult the PSPF webpages or your organisation’s protective security policy.

You must also follow your organisation’s information management requirements. Information (including data) that you create as part of your work for the Australian Government is a record. It provides evidence of what your organisation has done and why.

Managing and disposing of records properly is a requirement under the Archives Act 1983. For guidance, visit the National Archives of Australia website for content on information management standards.

Store video in a standard file format

All video should be in MP4 (MPEG4), a commercial and commonly used video file format. MP4 is a compressed digital video file format that allows streaming and downloading of videos from the internet. Most mobile devices support MP4 videos without people having to install an application (app) to play them. MP4 also supports metadata tags, attachments and interactive menus.

Meet common audio-only technical specifications

Audio-only content must meet common technical specifications (specs) for publishing on different platforms:

  • Government agencies have internal guidelines for publishing audio on their websites.
  • Audio hosting platforms have their own spec lists for audio quality.

Use a booth or studio if you can

It is best to record audio in a sound booth or recording studio. If you don’t have access to either, choose an environment you can control.

Recording in quiet environments makes it easier to understand the dialogue. Don’t record in windy conditions and environments with:

  • background noise from air conditioners
  • background noise from people talking
  • hard surfaces such as glass and concrete, because sound bounces off them.

Location noise is sometimes used in audio to set the scene or to indicate a time progression. Including this is a job for audio professionals.

Don’t record in noisy environments. Instead, record in quiet spaces with soft furnishings such as carpets or fabric, because they absorb sound.

Get expert help to edit and mix audio

The aim of audio editing is to produce audio that:

  • is clear and technically balanced
  • sounds natural while telling a cohesive and engaging story.

Whoever edits the audio must have the technical expertise to meet the accessibility and technical standards required. They will be familiar with audio editing tools, including restoration and enhancement software.

If the audio includes music or sound effects, you must still be able to hear and understand the person speaking. If several people are speaking, it should be easy to tell each person’s voice from the other.

Mix historical audio to match the same level of decibels as the rest of the recording.

Accessibility relates closely to technical standards for audio. W3C’s Making audio and video media accessible gives an overview of the relationship in ‘Audio content and video content’.

Accessibility requirements

User needs:

I can easily control how I see and hear distinctions if colour or sound convey meaning.

Web Content Accessibility Guidelines success criteria:

1.4.7 Low or no background audio – level AAA

Apply government branding to all audio and video

Branding applies to all government content, regardless of the platform that supports access to the content.

Text in video

Style all text in videos:

  • with clear and simple fonts (sans serif is usually the best choice)
  • to ensure that the text contrasts strongly with the background.

Accessibility requirements

User needs:

I can easily control how I see and hear distinctions if colour or sound convey meaning.

Web Content Accessibility Guidelines success criteria:

1.4.3 Contrast (minimum) – level AA

Avoid effects if they make the text less readable. Seek expert advice if you plan to use text effects (such as drop shadows, outlines and tinted blocks). If they are well designed, they can help give enough contrast and separation to make the text easier to read. Poorly designed text effects can make text very difficult to read.

Avoid colours that make the video less accessible for some users. For example, avoid using red and green as the only way to convey meaning.

Visual identifiers

Use your organisation’s branding guidelines to make sure any video is clearly government content. Visual branding tells people that video is created on behalf of the Australian Government.

Check that all text complies with your organisation’s branding guidelines. This includes text formatting and spelling. In this way, the video text is consistent with other content that your organisation publishes.

Branded sounds

Check if your agency has ‘branded’ sounds (also called audio logos). These help users immediately associate audio-visual or audio-only content with your agency.

Branded sounds occur at the very beginning or end of the content. They need to be short (no longer than 6 seconds) and provide a consistent identity for your organisation.

Set the video aspect and resolution for the user’s device

Whether you publish the video on a platform or your agency website, you will need to consider aspect ratio and resolution. Videos are more usable if they use as much of the screen space as possible.

Most people use their mobile devices (phones and tablets) to access social media. Square videos (with a 1:1 aspect ratio) are popular on Instagram. Vertical videos (with a 9:16 aspect ratio) are popular in the Stories feature in Facebook, YouTube, Instagram and other platforms.

The videos you access on a desktop computer are usually a horizontal rectangle. They usually have a 16:9 horizontal aspect ratio, with a minimum resolution of 1,920 pixels wide by 1,080 pixels high.

Different aspect ratios displayed on mobile devices
Figure: Aspect ratios for horizontal and vertical display on mobile devices

Introduce speakers in the video using text and graphics (lower thirds)

‘Lower thirds’ is the term for a graphic and text overlay that show a speaker’s name, organisation and role. Using lower thirds is mandatory. They are usually an image file added into the video editing application.

Despite the name, lower thirds don’t have to be in the lower third of the screen. Be guided by where the person’s face appears. They must appear in the ‘title-safe’ area. This is the area within a video frame where you can put text without the edges of the screen cutting it off. When adding a lower third to a video, a title-safe guide in the editing application appears over the text to help position it.

Be careful when lower thirds appear on the screen: don’t obscure the video’s captions and subtitles.

For interviews or ‘talking heads’, the usual style is to place the speaker’s name on the top line. Place the speaker’s title and organisation on the second line.

Lower thirds must:

  • be appropriate for government content
  • be correct
  • be as short as possible
  • have a strong contrast with the video’s background
  • use a text font that is readable at smaller resolutions or when compressed.

A sans serif font is the easiest to read, but use your organisation’s branding and style.

Get permissions and licences for copyright material

You must have a signed release from people who appear in a video. This includes everyone who is filmed and people whose voices have been recorded. Keep the signed release form and give a copy to the person who signed it.

Privacy requirements

Your organisation has obligations under the Privacy Act 1988.

Privacy is relevant whenever it’s possible to identify someone. Treat things that can or might identify an individual as personal information. Personal information can include things like a voice recording or someone’s appearance.

When you handle personal information, you must comply with the Australian Privacy Principles. Personal information is any information that could identify an individual, in any format.

Other restrictions might apply to copyright.

Some donors of material to collecting institutions may have restricted its use. Donors may also need to give written permission before you use material. Check with the collecting institution.

Before you use First Nations Australians’ content, you need proper permissions. Always consult with relevant communities (or individuals) and copyright holders. The permissions process is vital and differs from standard copyright procedures

Copyright requirements

You must get permission (a licence) to use copyright material. This includes text, lyrics, images, video and sounds.

You must choose a licence to release copyright materials. If you work in government, use an open access licence if you can (for example, Creative Commons).

Read the government copyright rules in the Australian Government intellectual property manual.

Use stock video if it enhances the production

Stock video is short content that you have not produced. Using stock video can save production time and money.

Government has its own stock video resources of large audiovisual collections, dating back to the 1890s. Collections cover a wide range of topics about Australian life, culture and history. For research support and usage terms and conditions, contact the respective access teams at the National Film and Sound Archive of Australia and the National Archives of Australia.

Be careful with commercial stock video. People will identify and dismiss stock video that looks as if it is not Australian or doesn’t seem genuine.

Other than historical material, ensure that stock video:

  • is crisp
  • has natural colours
  • focuses on people
  • meets common technical standards
  • is visibly Australian.

To use a stock video, you must purchase a licence from its owner. Some stock video has a Creative Commons licence.

Release notes

The digital edition significantly updates and expands information about video. It adds new guidance for audio as a distinct format. Updates and revisions cover when to use time-based media, mandatory requirements and accessibility.

The sixth edition mentioned video briefly in several sections but did not give comprehensive information on technical requirements for audio-visual content. It referred to ‘moving images’.

The Content Guide had guidelines on video use, content and length. It has information about accessibility requirements for video, including the use of audio description. 

About this page


Abrahams D (2017) SRT file – what is it and how to create one, Ai-Media website, accessed 16 September 2020.

Ai-Media, Closed captions vs subtitles – what’s the difference?, Ai-Media website, accessed 23 July 2020.

Australian Communications and Media Authority (2020) Broadcaster compliance with TV captioning obligations, ACMA website, accessed 16 September 2020.

Clement J (2020) Device usage of Facebook users worldwide as of July 2020, Statista website, accessed 16 September 2020.

Content Design London (2020) ‘Moving images’, Content Design London readability guidelines, Content Design London website, accessed 16 September 2020.

General Services Administration (n.d.) ‘Multimedia’, 18F accessibility guide, 18F website, accessed 16 September 2020.

Lynch PJ and Horton S (2016) Web style guide, 4th edn, Yale University Press, New Haven and London.

New Zealand Government (2020) Accessible video,, accessed 16 September 2020.

StudioBinder (22 March 2020) ‘What is a lower third? Definition and design strategies’, StudioBinder blog, accessed 16 September 2020.

Treasury Board of Canada Secretariat (2020) ‘Images and videos’, content style guide,, accessed 16 September 2020.

W3C (World Wide Web Consortium) (n.d.) ‘Understanding guideline 1.2: time-based media’, Understanding WCAG 2.1, W3C website, accessed 16 September 2020.

W3C (2016) Techniques for WCAG 2.0, W3C website, accessed 20 May 2020.

W3C (2019) ‘Audio content and video content’, Making audio and video content accessible, W3C website, accessed 25 August 2020.

WebAIM (2020) Captions, transcripts, and audio descriptions, WebAIM website, accessed 16 September 2020.

York A (2020) ‘Always up-to-date guide to social media video specs’, Sprout, accessed 16 September 2020.

This page was updated Monday 8 August 2022.