[SOLVED] Seeking help from FFmpeg users - A/V sync issue

Post by **Player701** » Wed Jul 13, 2022 10:29 am

Hello there! I've decided to re-post this from the ffmpeg-user mailing list just in case there are some people here who have experience with FFmpeg and might be willing to help me solve a certain issue related to its usage. I know it's probably a shot in the dark, but I've been struggling with this for a while, so I thought I'd post this in as many places as possible, as long as it would be appropriate (I think I'll also try Super User when I have the time).

I apologize in advance for a lot of text: I wanted to provide as much information as I could about my problem, and I hope it might prove useful in finding a solution.

Synopsis: I'm recording live video from an IP camera that transmits the feed (video+audio) over RTSP. The recording is supposed to be running continuously, and footage is cut up in 10 minute-long segments. The server performing the recording does not have enough CPU power to transcode the stream, but that's not a problem since the camera provides already encoded video (H.264 or H.265). This is the command-line I'm using:

Code: Select all

ffmpeg -nostdin -flags low_delay -fflags +nobuffer+discardcorrupt \
-rtsp_transport tcp -timeout 3000000 \
-i rtsp://login:password@ip.ad.dre.ss:554/url \
-map 0:v -c:v copy -map 0:a -c:a copy \
-f segment -strftime 1 -reset_timestamps 1 -segment_atclocktime 1 -segment_time 600 "%Y-%m-%dT%H-%M-%S.mkv"

I have multiple IP cameras at home, and this command-line works flawlessly with all but one of them. The problem arises with one specific camera, which is a different make and model than the rest. It is a cheap no-brand Chinese IP camera, and according to the device info page in its web UI, it's called "F8/IPG-9280PGS-AI". Not sure if that's going to help, but mentioning it just in case.

Here's what happens while recording from this camera:

Significant video lag gets accumulated over time. E.g. the first frame of the segment started at 12:00:00 may show the time 11:59:50 on the camera clock. The delay is only a few seconds at first, but it gets more and more noticeable with each recording. Note that the camera clock itself is not going out of sync with the actual time - the delay is reset upon restarting the recording process.
As a result of #1, audio ends up ahead of video, the delay correlating linearly (and apparently 1:1) to the video lag and increasing over time. E.g. if I knock on a solid surface with my finger in view of the camera, the sound comes much earlier than the video. This becomes easily noticeable after only some 10-15 minutes of continuous recording.

I'm 100% certain that this is not a bug in FFmpeg itself - I have different versions on the server (4.4.2, from the FreeBSD ports collection) and on my desktop machine (5.0.1, from gyan.dev), and the issue manifests with both of them, and only with one specific camera (see above).

FFMPEG output upon running the command-line above (banner, compilations flags etc. omitted for brevity):

Code: Select all

Input #0, rtsp, from 'rtsp://login:password@ip.ad.dre.ss:554/url':
  Metadata:
    title           : RTSP Server
  Duration: N/A, start: 0.000000, bitrate: N/A
  Stream #0:0: Video: h264 (Main), yuvj420p(pc, bt709, progressive), 2560x1440, 25 fps, 25 tbr, 90k tbn
  Stream #0:1: Audio: pcm_mulaw, 8000 Hz, mono, s16, 64 kb/s
Output #0, segment, to '%Y-%m-%dT%H-%M-%S.mkv':
  Metadata:
    title           : RTSP Server
    encoder         : Lavf59.16.100
  Stream #0:0: Video: h264 (Main), yuvj420p(pc, bt709, progressive), 2560x1440, q=2-31, 25 fps, 25 tbr, 1k tbn
  Stream #0:1: Audio: pcm_mulaw, 8000 Hz, mono, s16, 64 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Stream #0:1 -> #0:1 (copy)

What else I've found out about this issue so far:

0. The web UI of the camera itself plays back the feed flawlessly

I mean, it requires Internet Explorer (or Edge in IE mode) to work, but I've streamed the feed from there for hours, and there weren't any delays or desyncs at all. If it stays in sync there, then there must be a way to make it stay in sync during recording.

1. It only appears when copying the audio stream along with the video

If the audio stream is ignored, the video lag is gone - the camera clock time shown on the first frame of each segment is always on time (give or take a keyframe interval of 2 seconds).

Code: Select all

# this does not have video lag accumulating over time
ffmpeg -nostdin -flags low_delay -fflags +nobuffer+discardcorrupt \
-rtsp_transport tcp -timeout 3000000 \
-i rtsp://login:password@ip.ad.dre.ss:554/url \
-map 0:v -c:v copy -f segment -strftime 1 -reset_timestamps 1 -segment_atclocktime 1 -segment_time 600 "%Y-%m-%dT%H-%M-%S.mkv"

However, removing audio is not a viable solution for me - I do want to have audio recorded along with the video.

2. It is related to the camera dropping video frames

FFMPEG reports that the RTSP stream has a constant frame rate of 25 FPS, but it seems that the camera is not able to maintain it consistently at all times. A 10-minute segment is normally supposed to have 15000 frames, but will sometimes contain about 14900 or even less. Despite that, no perceptible video playback issues are observed. Experiments confirm with almost absolute certainty that the audio/video lag increases only when a segment has less frames than expected. Note that this frame count discrepancy presents a significant difference compared to other cameras that do not have sync issues. I tried picking some segments recorded from them at random, and they all had exactly 15000 frames.

3. Can be rectified with -use_wallclock_as_timestamps 1, but the video stutters

This option, according to the FFmpeg documentation, ignores the timestamps in the input stream and constructs new ones based on the system clock. It completely eliminates any synchronization issues, but the recorded video exhibits constant stuttering with a period roughly equal to the keyframe interval (2 seconds). This stuttering is very much noticeable, and I'd prefer to avoid it.

4. -vsync cfr doesn't help (likely needs transcoding)

When it comes to multimedia, my knowledge is very hit-and-miss. I've never worked with it professionally, so most of what I know comes from the internet. And so I thought: if the video does not have enough frames, perhaps there is a way to duplicate some existing frames to match the expected frame rate? The documentation on FFmpeg mentions a -vsync option that seemed like the way to go:

FFmpeg documentation wrote:cfr (1)
Frames will be duplicated and dropped to achieve exactly the requested constant frame rate.

Alas, this did not work as I'd hoped, and having researched the topic a bit more, I think now I know why: FFmpeg likely needs to decode the video to be able to duplicate a frame because not every frame in the encoded stream is self-contained. Needless to say, transcoding a 1440p video stream is completely out of the question for me, the server simply does not have that much CPU resources to utilize. Furthermore, this seems like a needless waste of energy because the output from the camera comes already encoded.

5. Sync audio to video instead of video to audio?

It doesn't take nearly as much CPU time to re-encode audio compared to video, and if I want to live-stream the feed e.g. to YouTube (and in this particular case, I do), the pcm_mulaw codec is not supported there anyhow. For live-streaming, I use AAC with the following options:

Code: Select all

-c:a aac -ar 48000 -ac 2 -b:a 128k

I tried this with other cameras, and there was sometimes a slight delay between the audio and video streams, not increasing with time. With some Google-fu, I managed to get rid of it by also adding:

Code: Select all

-af aresample=async=1

However, this still didn't resolve the sync issues with the problematic camera. I also tried increasing the async= value up to 10000 and even 100000, to no avail. The documentation suggests it might be helpful in syncing the audio track, but so far my experiments with it have been unsuccessful.

6. Conclusion

And that's it so far. I've been going crazy about this for days, but I'm still not sure what I might be missing here. I never even thought that such a problem could arise in the first place, since both the video and audio come from a single source, and I still can't exactly understand what causes them to go out of sync. So far, the only thing I know for sure is that it happens when the camera doesn't provide enough video frames to maintain a constant FPS.

I will greatly appreciate any help, any advice, from anyone who has any experience with FFmpeg - as long as it doesn't involve transcoding the video stream. And I'm not exactly sure, but I think it should be possible to solve my problem without transcoding the audio either (see #0), but it's not a big deal if I have to do it. If necessary, I can also provide sample video files (but beware that they are huge!), or analyze the recorded segments and/or the stream itself with ffprobe or other utilities, provided that they can be downloaded for free from a trusted source.

Thank you very much.

Post by **Player701** » Sat Jul 16, 2022 10:22 am

I've finally managed to solve this, although IMO the solution is far from perfect. It actually works great, but it's the complexity of it that bothers me: I simply can't believe there's no easier way. In case someone ever encounters a similar issue, I'm posting the current solution here.

First, some words about the actual cause of the problem. I thought the audio/video lag increased when the camera couldn't maintain a constant FPS and dropped video frames. But it's actually not the case. The FPS does drop sometimes, but the real problem is timestamp drift. Suppose a video has 25 FPS, and timestamps are measured in 1/1000ths of a second. Then, assuming no dropped frames, the sequence of timestamps would be:

Code: Select all

0, 40, 80, 120, 160, 200, 240, 280, 320, ...

But when recording from this camera, the produced timestamps exhibit a periodic forward drift, e.g.

Code: Select all

0, 40, 81, 121, 163, 203, 248, 328, ...

This drift does not correspond to the actual frame time, and happens even when the FPS is constant. Example: for testing purposes, I recorded the video for several hours without audio. One of the 10-minute segments recorded was reported to have a length of 10 minutes and 12 seconds, but it still had 15000 frames (as expected), and the camera clock ran for exactly 10 minutes. It is obvious that the timestamps are incorrect because during real-time recording, they will eventually appear to come from the future. Unfortunately, FFmpeg seems to have no means of detecting this. (If it actually does, please let me know!)

Now, while recording with audio, the audio stream does not seem to exhibit any timestamp drift. Therefore, when matching video and audio timestamps for the output, a gradual lag between audio and video is introduced.

To mitigate this problem, I first tried to adjust the timestamps with the "setts" filter (see documentation here). But no matter what I did with them, the video still went out of sync with audio in the long run. I analyzed a few more recordings and found out that the drift offset can vary wildly: for example, during the course of several minutes, it can increase from a mere 2 to more than 1000 microseconds. These timestamps simply cannot be trusted, they have to be discarded entirely. And -use_wallclock_as_timestamps 1 does just that - but also introduces stuttering. I theorized that it might be possible to fix this stuttering, and eventually came up with the following filter:

Code: Select all

setts='max(floor(PTS/X)*X,if(N,PREV_OUTPTS+X))'

Here, PTS is the current timestamp, generated from the wallclock time, PREV_OUTPTS is the previous timestamp, and X is some constant that has to be replaced with a value depending on the application. N is simply the packet number, which is equal to 0 at the start. if(N,<expression>) evaluates to <expression> if N is not 0, and otherwise evaluates to 0.

If you're not too bad at math, you can see that the filter does two things: first, it rounds the timestamps down to nearest multiples of X, and also ensures that the values are always increasing. As for X, when recording it has to be set to 40, provided that the values are measured in 1/1000ths of a second and the camera gives 25 FPS. Now, the latter is not always the case, and if the frame rate changes, even stranger things will happen. But we can ensure they don't by keeping the gate closed assuming a constant frame rate for the input stream with -r 25. The filter has to be applied to both video and audio to ensure smooth playback.

Final command-line for recording:

Code: Select all

ffmpeg -nostdin -flags low_delay -fflags +nobuffer+discardcorrupt \
-rtsp_transport tcp -timeout 3000000 -use_wallclock_as_timestamps 1 \
-r 25 -i rtsp://login:password@ip.ad.dre.ss:554/url \
-map 0:v -c:v copy -bsf:v setts='max(floor(PTS/40)*40,if(N,PREV_OUTPTS+40))' \
-map 0:a -c:a copy -bsf:a setts='max(floor(PTS/40)*40,if(N,PREV_OUTPTS+40))' \
-f segment -strftime 1 -reset_timestamps 1 -segment_atclocktime 1 -segment_time 600 "%Y-%m-%dT%H-%M-%S.mkv"

If live-streaming is also desired, the filters have to be set up as follows (assuming RTMP streaming via -f flv or fifo): for video, X has to be set to 1, and for audio it will depend on the source sampling rate. In my case the sampling rate is 8000 Hz mono, and so X = 8000/FPS = 8000/25 = 320. Also considering that audio is being re-encoded, the filter has to be applied to the input data via -af instead of -bsf:a (the latter will apply the filter to the AAC-encoded stream instead, we don't want that).

The resulting additional commands for streaming (assuming RTMP/FLV via -f fifo):

Code: Select all

-map 0:v -c:v copy -bsf:v setts='max(floor(PTS),if(N,PREV_OUTPTS+1))' \
-map 0:a -c:a aac -ar 48000 -ac 2 -b:a 128k -af asetpts='max(floor(PTS/320)*320,if(N,PREV_OUTPTS+320))' \
-f fifo -fifo_format flv -drop_pkts_on_overflow 1 -attempt_recovery 1 -recover_any_error 1 -format_opts flvflags=no_duration_filesize rtmp://<STREAM_URL>

NB: the current version of FFmpeg in the FreeBSD ports collection (4.4.2) needs these two patches for the proposed solution to work:

https://github.com/FFmpeg/FFmpeg/commit ... f984ddafe5
https://github.com/FFmpeg/FFmpeg/commit ... 1d5162daff

The first patch fixes the expression parser erroring out, and the second one fixes the PREV_OUTPTS value always equal to NOPTS. Also, "timeout" has to be replaced with "stimeout". If you have FFmpeg 5.0+, you do not need these fixes.

I'm still not sure if this solution is the proper one. So far, it's been running for many hours, and the resulting video is smooth as butter, and without any gradually increasing audio/video lag. But it looks extremely overcomplicated, not to mention it took me several days of researching and analyzing the video files to implement. Also, I don't know where the timestamp drift actually occurs - most signs point to the camera, but there's also the fact that some sort of conversion takes place depending on the output (e.g. segment/mkv measures timestamps in 1/1000ths of a second, but flv measures them in frames), and it might be possible that there's a bug somewhere in there.

For simplicity though, let's assume there's no bug, and the fault occurs at the source. We know that the audio is always on time, so why not use the timestamps of the audio packets for the video too? E.g. for each incoming video frame, assign it the timestamp of the latest audio packet received (not the wallclock time). The problem is that "setts" filters cannot interact with each other, so it's not possible to use them for this purpose.

I'm still no expert, so further comments, in case someone may have any, are still welcome.

Enjoy the stream! (Right now the title picture is outdated, I'll make a new one when I can. done!)

P.S. If the stream has ended, use permalink, or subscribe to the channel to receive a notification when the stream resumes.

Welcome to the ZDoom Forums!

[SOLVED] Seeking help from FFmpeg users - A/V sync issue

[SOLVED] Seeking help from FFmpeg users - A/V sync issue

Re: Seeking help from FFmpeg users - A/V sync issue