Skip to main content

Upload your video. Captions, transcripts, and chapters — done.

VideoNest Audio automatically transcribes every upload, generating synchronized WebVTT captions, word-level JSON, and searchable plain text. Add chapter markers to help viewers navigate. No manual work, no third-party tools.

CC
This is the National Stock and Bonds Show
in Washington, D.C.
00:10
VideoNest Audio · Transcribing parakeet-tdt-0.6b
00:00:00 The number one, I believe, sold for a million bucks.
00:00:02 One day they called me from the SEC and they said,
00:00:05 we have 15 people here in our room and want to listen...
00:00:10 This is the National Stock and Bonds Show
CC
This is the National Stock and Bonds Show
in Washington, D.C.
00:10

Horizontal · 16:9 · captions auto-generated

CC
Short-form caption text
Vertical · 9:16 Captions scale automatically to every aspect ratio — horizontal broadcasts, short-form reels, and portrait embeds all use the same transcription output.

Two formats. One transcription.

From a single audio pass, VideoNest generates a caption-ready WebVTT file and a word-level JSON transcript. The examples below are from this video.

VTT WebVTT Caption File Player-ready
WEBVTT

1
00:00:00.000 --> 00:00:02.560
The number one, I believe, sold for a million bucks.

2
00:00:10.560 --> 00:00:14.080
This is the National Stock and Bonds Show in Washington, D.C.
hosted_file · file_type=VTT · video_id=454084
JSON Word-Level Transcript Developer & search

  "video_id" 454084
  "model" "nvidia/parakeet-tdt-0.6b-v3"
  "language" "en"
  "words"
     "word" "The"    "start" 0.00 "end" 0.16 "confidence" 0.98 
     "word" "number" "start" 0.16 "end" 0.48 "confidence" 0.97 
     "word" "one,"   "start" 0.48 "end" 0.80 "confidence" 0.99 
    // ... one entry per spoken word
  
hosted_file · file_type=transcript-json · video_id=454084

Built for every viewer, every context

Auto-Transcription

Transcribed on upload

VideoNest Audio runs on every hosted video. WebVTT, JSON, and plain text are generated automatically. No manual file upload required.

Navigation

Chapter markers

Add chapter markers to any video so viewers can jump directly to the section they want. Set titles and timestamps in your video settings — the player handles the rest.

Multi-Language

Multiple caption tracks

Attach multiple caption files per video. Viewers choose their language from the player controls. No separate video files needed.

Accessibility

WCAG 2.1 AA controls

Player controls are keyboard-navigable and screen-reader compatible. Meets WCAG 2.1 AA accessibility standards at the player level.

Sound-Off Reach

Auto-on captions

Configure captions to display by default — for social embeds, news feeds, and mobile placements where viewers watch without audio.

SEO & Search

Searchable transcripts

Word-level JSON transcripts make every second of your video searchable. Index content, power in-site search, and surface video at the right moment.

Upload once. Captions, chapters, and transcripts — automatic.

VideoNest Audio transcribes every video on upload. Accessible, searchable, and ready to publish anywhere — without touching a third-party tool.