I apologize in advance for a lot of text: I wanted to provide as much information as I could about my problem, and I hope it might prove useful in finding a solution.
Synopsis: I'm recording live video from an IP camera that transmits the feed (video+audio) over RTSP. The recording is supposed to be running continuously, and footage is cut up in 10 minute-long segments. The server performing the recording does not have enough CPU power to transcode the stream, but that's not a problem since the camera provides already encoded video (H.264 or H.265). This is the command-line I'm using:
Code: Select all
ffmpeg -nostdin -flags low_delay -fflags +nobuffer+discardcorrupt \
-rtsp_transport tcp -timeout 3000000 \
-i rtsp://login:password@ip.ad.dre.ss:554/url \
-map 0:v -c:v copy -map 0:a -c:a copy \
-f segment -strftime 1 -reset_timestamps 1 -segment_atclocktime 1 -segment_time 600 "%Y-%m-%dT%H-%M-%S.mkv"
Here's what happens while recording from this camera:
- Significant video lag gets accumulated over time. E.g. the first frame of the segment started at 12:00:00 may show the time 11:59:50 on the camera clock. The delay is only a few seconds at first, but it gets more and more noticeable with each recording. Note that the camera clock itself is not going out of sync with the actual time - the delay is reset upon restarting the recording process.
- As a result of #1, audio ends up ahead of video, the delay correlating linearly (and apparently 1:1) to the video lag and increasing over time. E.g. if I knock on a solid surface with my finger in view of the camera, the sound comes much earlier than the video. This becomes easily noticeable after only some 10-15 minutes of continuous recording.
FFMPEG output upon running the command-line above (banner, compilations flags etc. omitted for brevity):
Code: Select all
Input #0, rtsp, from 'rtsp://login:password@ip.ad.dre.ss:554/url':
Metadata:
title : RTSP Server
Duration: N/A, start: 0.000000, bitrate: N/A
Stream #0:0: Video: h264 (Main), yuvj420p(pc, bt709, progressive), 2560x1440, 25 fps, 25 tbr, 90k tbn
Stream #0:1: Audio: pcm_mulaw, 8000 Hz, mono, s16, 64 kb/s
Output #0, segment, to '%Y-%m-%dT%H-%M-%S.mkv':
Metadata:
title : RTSP Server
encoder : Lavf59.16.100
Stream #0:0: Video: h264 (Main), yuvj420p(pc, bt709, progressive), 2560x1440, q=2-31, 25 fps, 25 tbr, 1k tbn
Stream #0:1: Audio: pcm_mulaw, 8000 Hz, mono, s16, 64 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Stream #0:1 -> #0:1 (copy)
0. The web UI of the camera itself plays back the feed flawlessly
I mean, it requires Internet Explorer (or Edge in IE mode) to work, but I've streamed the feed from there for hours, and there weren't any delays or desyncs at all. If it stays in sync there, then there must be a way to make it stay in sync during recording.
1. It only appears when copying the audio stream along with the video
If the audio stream is ignored, the video lag is gone - the camera clock time shown on the first frame of each segment is always on time (give or take a keyframe interval of 2 seconds).
Code: Select all
# this does not have video lag accumulating over time
ffmpeg -nostdin -flags low_delay -fflags +nobuffer+discardcorrupt \
-rtsp_transport tcp -timeout 3000000 \
-i rtsp://login:password@ip.ad.dre.ss:554/url \
-map 0:v -c:v copy -f segment -strftime 1 -reset_timestamps 1 -segment_atclocktime 1 -segment_time 600 "%Y-%m-%dT%H-%M-%S.mkv"
2. It is related to the camera dropping video frames
FFMPEG reports that the RTSP stream has a constant frame rate of 25 FPS, but it seems that the camera is not able to maintain it consistently at all times. A 10-minute segment is normally supposed to have 15000 frames, but will sometimes contain about 14900 or even less. Despite that, no perceptible video playback issues are observed. Experiments confirm with almost absolute certainty that the audio/video lag increases only when a segment has less frames than expected. Note that this frame count discrepancy presents a significant difference compared to other cameras that do not have sync issues. I tried picking some segments recorded from them at random, and they all had exactly 15000 frames.
3. Can be rectified with -use_wallclock_as_timestamps 1, but the video stutters
This option, according to the FFmpeg documentation, ignores the timestamps in the input stream and constructs new ones based on the system clock. It completely eliminates any synchronization issues, but the recorded video exhibits constant stuttering with a period roughly equal to the keyframe interval (2 seconds). This stuttering is very much noticeable, and I'd prefer to avoid it.
4. -vsync cfr doesn't help (likely needs transcoding)
When it comes to multimedia, my knowledge is very hit-and-miss. I've never worked with it professionally, so most of what I know comes from the internet. And so I thought: if the video does not have enough frames, perhaps there is a way to duplicate some existing frames to match the expected frame rate? The documentation on FFmpeg mentions a -vsync option that seemed like the way to go:
Alas, this did not work as I'd hoped, and having researched the topic a bit more, I think now I know why: FFmpeg likely needs to decode the video to be able to duplicate a frame because not every frame in the encoded stream is self-contained. Needless to say, transcoding a 1440p video stream is completely out of the question for me, the server simply does not have that much CPU resources to utilize. Furthermore, this seems like a needless waste of energy because the output from the camera comes already encoded.FFmpeg documentation wrote:cfr (1)
Frames will be duplicated and dropped to achieve exactly the requested constant frame rate.
5. Sync audio to video instead of video to audio?
It doesn't take nearly as much CPU time to re-encode audio compared to video, and if I want to live-stream the feed e.g. to YouTube (and in this particular case, I do), the pcm_mulaw codec is not supported there anyhow. For live-streaming, I use AAC with the following options:
Code: Select all
-c:a aac -ar 48000 -ac 2 -b:a 128k
Code: Select all
-af aresample=async=1
6. Conclusion
And that's it so far. I've been going crazy about this for days, but I'm still not sure what I might be missing here. I never even thought that such a problem could arise in the first place, since both the video and audio come from a single source, and I still can't exactly understand what causes them to go out of sync. So far, the only thing I know for sure is that it happens when the camera doesn't provide enough video frames to maintain a constant FPS.
I will greatly appreciate any help, any advice, from anyone who has any experience with FFmpeg - as long as it doesn't involve transcoding the video stream. And I'm not exactly sure, but I think it should be possible to solve my problem without transcoding the audio either (see #0), but it's not a big deal if I have to do it. If necessary, I can also provide sample video files (but beware that they are huge!), or analyze the recorded segments and/or the stream itself with ffprobe or other utilities, provided that they can be downloaded for free from a trusted source.
Thank you very much.