Skip to main content

Learn to use the requestVideoFrameCallback () to work more efficiently with videos in the browser.


There is a new web API in the block, defined in the
HTMLVideoElement.requestVideoFrameCallback ()

specification. the requestVideoFrameCallback () The method allows web authors to record a callback that runs through the rendering steps when a new video frame is sent to the composer. This is intended to enable developers to perform efficient operations per frame of video to video, such as video processing and painting on a canvas, video analysis, or syncing with external audio sources.

Difference with requestAnimationFrame ()

Operations such as drawing a video frame on a canvas using
drawImage ()

made through this API will be synchronized as best effort with the frame rate of the video that is played on the screen. Different from
window.requestAnimationFrame (), which typically fires about 60 times per second,
requestVideoFrameCallback () is tied to the actual video frame rate, with an important

The effective rate at which callbacks are executed is the lower rate of the video rate and the browser rate. This means that a 25fps video played in a browser that paints at 60Hz would trigger callbacks at 25Hz. A 120fps video on that same 60Hz browser would trigger 60Hz callbacks.

What's in a name?

Due to its similarity to window.requestAnimationFrame (), the method initially was proposed as video.requestAnimationFrame (), but I'm happy with the new name,
requestVideoFrameCallback (), which was agreed after a long discussion. Hurrah, bikes for the victory!

Browser support and feature detection

The method is
implemented in Chromium
already, and
Mozilla people like. For what it's worth, I've also submitted a
WebKit error asking for it. API feature detection works like this:

if ( 'requestVideoFrameCallback' in HTMLVideoElement . prototype ) {

Using the requestVideoFrameCallback () method

If you have ever used the requestAnimationFrame () method, you will immediately feel at home with the requestVideoFrameCallback () method. You register an initial callback once and then re-register each time the callback is triggered.

const doSomethingWithTheFrame = ( now , metadata ) => {
console . log ( now , metadata ) ;
video . requestVideoFrameCallback ( doSomethingWithTheFrame ) ;
} ;
video . requestVideoFrameCallback ( doSomethingWithTheFrame ) ;

In the callback, now is a DOMHighResTimeStamp

and metadata is a VideoFrameMetadata

dictionary with the following properties:

  • presentationTime, of type DOMHighResTimeStamp: The time when the user agent sent the frame for composition.
  • expectedDisplayTime, of type DOMHighResTimeStamp: The time when the user agent expects the frame to be visible.
  • width, of type unsigned long: The width of the video frame, in media pixels.
  • height, of type unsigned long: The height of the video frame, in multimedia pixels.
  • mediaTime, of type double: The media display timestamp (PTS) in seconds of the displayed frame (for example, your timestamp in the video.currentTime timeline).
  • presentedFrames, of type unsigned long: Count of the number of frames sent for composition. Allows customers to determine if frames were lost between instances of VideoFrameRequestCallback.
  • processingDuration, of type double: The elapsed duration in seconds since the sending of the packet encoded with the same presentation time stamp (PTS) as this frame (for example, the same as the mediaTime) to the decoder until the decoded frame is ready for presentation.

For WebRTC applications, additional properties may appear:

  • captureTime, of type DOMHighResTimeStamp: For video frames coming from a local or remote source, this is the moment the camera captured the frame. For a remote source, capture time is estimated using RTCP sender clock synchronization and reporting to convert RTP timestamps to capture time.
  • receiveTime, of type DOMHighResTimeStamp: For video frames that come from a remote source, this is the time the platform received the encoded frame, that is, the time the last packet belonging to this frame was received through the network.
  • rtpTimestamp, of type unsigned long: The RTP timestamp associated with this video frame.

Note that width and height may differ from videoWidth and videoHeight in certain cases (for example, an anamorphic video may have rectangular pixels).

Of special interest in this list is mediaTime. In the Chromium implementation, we use the audio clock as the time source that supports video.currentTime, Meanwhile he mediaTime is populated directly by the presentationTimestamp of the frame. the mediaTime is what to use if you want to accurately identify the frames in a reproducible way, even to identify exactly the frames that were lost.

Unfortunately, the video element does not guarantee frame accuracy searching. This has been a continuous topic of discussion.
WebCodecs it will eventually allow for frame-accurate applications.

If things seem like a box away ...

Vertical sync (or simply vsync) is a graphics technology that synchronizes the frame rate of a video and the refresh rate of a monitor. As requestVideoFrameCallback () runs in the main thread, but under the hood video compositing happens in the composer thread, everything from this API is a best effort and we do not offer any strict guarantees. What may be happening is that the API may lag a vsync relative to when a video frame is rendered. A vsync is required for changes made to the web page via the API to appear on the screen (same as window.requestAnimationFrame ()). So if you keep updating mediaTime or the frame number on your web page and compare it to the numbered video frames, eventually the video will look like it is one frame ahead.

What is actually happening is that the frame is ready in vsync x, the callback fires, and the frame is processed in vsync x + 1, and the changes made to the callback are processed in vsync x + 2. You can check if the callback is a late vsync (and the frame is already rendered on the screen) by checking if the metadata.expectedDisplayTime is approximately now or a vsync in the future. If it is within five to ten microseconds of now, the frame is already rendered; If he expectedDisplayTime it's roughly sixteen milliseconds into the future (assuming your browser / screen updates to 60Hz), then it's in sync with the frame.


I have created a small
Glitch demo
which shows how frames are drawn on a canvas at exactly the frame rate of the video and where frame metadata is recorded for debugging purposes. The core logic is just a couple of lines of JavaScript.

let paintCount = 0 ;
let startTime = 0.0 ;

const updateCanvas = ( now , metadata ) => {
if ( startTime === 0.0 ) {
startTime = now ;

ctx . drawImage ( video , 0 , 0 , canvas.width , canvas.height ) ;

const elapsed = ( now - startTime ) / 1000.0 ;
const fps = ( ++ paintCount / elapsed ) . toFixed ( 3 ) ;
fpsInfo . innerText = ` video fps: ${ fps } ` ;
metadataInfo . innerText = JSON . stringify ( metadata , null , 2 ) ;

video . requestVideoFrameCallback ( updateCanvas ) ;
} ;

video . requestVideoFrameCallback ( updateCanvas ) ;


I've done frame-level processing for a long time, not having access to the actual frames, just based on video.currentTime. I implemented segmentation of video shots in JavaScript in a rough way; you can still read the attachment
research work. Had the requestVideoFrameCallback () existed back then, my life would have been much simpler ...


the requestVideoFrameCallback The API was specified and implemented by
Thomas guilbert. This article was reviewed by Joe medley
and Kayce Basques.
Hero image for
Denise Jans on Unsplash.