Captions & Transcripts

Upload your video. Captions, transcripts, and chapters — done.

VideoNest Audio automatically transcribes every upload, generating synchronized WebVTT captions, word-level JSON, and searchable plain text. Add chapter markers to help viewers navigate. No manual work, no third-party tools.

Start Free All player features

This is the National Stock and Bonds Show
in Washington, D.C.

00:10

VideoNest Audio · Transcribing parakeet-tdt-0.6b

00:00:00 The number one, I believe, sold for a million bucks.

00:00:02 One day they called me from the SEC and they said,

00:00:05 we have 15 people here in our room and want to listen...

00:00:10 This is the National Stock and Bonds Show

This is the National Stock and Bonds Show
in Washington, D.C.

00:10

Horizontal · 16:9 · captions auto-generated

Short-form caption text

Vertical · 9:16 Captions scale automatically to every aspect ratio — horizontal broadcasts, short-form reels, and portrait embeds all use the same transcription output.

Transcript formats

Two formats. One transcription.

From a single audio pass, VideoNest generates a caption-ready WebVTT file and a word-level JSON transcript. The examples below are from this video.

VTT WebVTT Caption File Player-ready

WEBVTT

1
00:00:00.000 --> 00:00:02.560
The number one, I believe, sold for a million bucks.

2
00:00:10.560 --> 00:00:14.080
This is the National Stock and Bonds Show in Washington, D.C.

hosted_file · file_type=VTT · video_id=454084

JSON Word-Level Transcript Developer & search

{
  "video_id": 454084,
  "model": "nvidia/parakeet-tdt-0.6b-v3",
  "language": "en",
  "words": [
    { "word": "The",    "start": 0.00, "end": 0.16, "confidence": 0.98 },
    { "word": "number", "start": 0.16, "end": 0.48, "confidence": 0.97 },
    { "word": "one,",   "start": 0.48, "end": 0.80, "confidence": 0.99 }
    // ... one entry per spoken word
  ]
}

hosted_file · file_type=transcript-json · video_id=454084

Capabilities

Built for every viewer, every context

Auto-Transcription

Transcribed on upload

VideoNest Audio runs on every hosted video. WebVTT, JSON, and plain text are generated automatically. No manual file upload required.

Navigation

Chapter markers

Add chapter markers to any video so viewers can jump directly to the section they want. Set titles and timestamps in your video settings — the player handles the rest.

Multi-Language

Multiple caption tracks

Attach multiple caption files per video. Viewers choose their language from the player controls. No separate video files needed.

Accessibility

WCAG 2.1 AA controls

Player controls are keyboard-navigable and screen-reader compatible. Meets WCAG 2.1 AA accessibility standards at the player level.

Sound-Off Reach

Auto-on captions

Configure captions to display by default — for social embeds, news feeds, and mobile placements where viewers watch without audio.

SEO & Search

Searchable transcripts

Word-level JSON transcripts make every second of your video searchable. Index content, power in-site search, and surface video at the right moment.

More player features

Upload your video. Captions, transcripts, and chapters — done.

Two formats. One transcription.

Built for every viewer, every context

Transcribed on upload

Chapter markers

Multiple caption tracks

WCAG 2.1 AA controls

Auto-on captions

Searchable transcripts

More ways to control the viewer experience

Autoplay & Sound

Branding Controls

Embed Options

Player Customization

Player API

Vertical Video Player

Upload once. Captions, chapters, and transcripts — automatic.