FFmpeg In Android(安卓)- tutorial-5- Synching Video同步视频

How Video Syncs 如何同步视频

So this whole time, we’ve had an essentially useless movie player. It plays the video, yeah, and it plays the audio, yeah, but it’s not quite yet what we would call a movie. So what do we do?
前面整个的一段时间，我们有了一个几乎无用的电影播放器。当然，它能播放视频，也能播放音频，但是它还不能被确切地称之为一部电影。那么我们还要做什么呢？

PTS and DTS

Fortunately, both the audio and video streams have the information about how fast and when you are supposed to play them inside of them. Audio streams have a sample rate, and the video streams have a frames per second value. However, if we simply synced the video by just counting frames and multiplying by frame rate, there is a chance that it will go out of sync with the audio. Instead,
packets from the stream might have what is called a decoding time stamp (DTS) and a presentation time stamp (PTS). To understand these two values, you need to know about the way movies are stored. Some formats, like MPEG, use what they call “B” frames (B stands for “bidirectional”). The two other kinds of frames are called “I” frames and “P” frames (“I” for “intra” and “P” for “predicted”). I frames contain a full image. P frames depend upon previous I and P frames and are like diffs or deltas. B frames are the same as P frames, but depend upon information found in frames that are displayed both before and after them! This explains why we might not have a finished frame after we call avcodec_decode_video2.
幸运的是，音频和视频流都有一些关于以多快速度和什么时间来播放它们的信息在里面。音频流有采样，视频流有每秒的帧率。然而，如果我们只是简单的通过数帧和乘以帧率的方式来同步视频，那么就很有可能会失去同步。于是作为一种补充，在流中的包有种叫做解码时间戳（DTS）和显示时间戳（PTS）的机制。为了理解这两个参数，你需要了解电影的存储方式。像 MPEG 等格式，使用被叫做 B 帧（B 代表“bidrectional”）的方式。另外两种帧被叫做 I 帧和 P 帧（I 代表”intra”， P 代表“predicted”）。 I 帧包含了某个特定的完整图像。 P 帧依赖于前面的 I 帧和 P 帧并且使用比较或者差分的方式来编码。 B 帧与 P 帧有点类似，但是它是依赖于前面和后面的帧的信息的。这也就解释了为什么我们可能在调用 avcodec_decode_video2以后会得不到一帧图像。

So let’s say we had a movie, and the frames were displayed like: I B B P. Now, we need to know the information in P before we can display either B frame. Because of this, the frames might be stored like this: I P B B. This is why we have a decoding timestamp and a presentation timestamp on each frame. The decoding timestamp tells us when we need to decode something, and the presentation time stamp tells us when we need to display something. So, in this case, our stream might look like this:
所以对于一个电影，帧是这样来显示的： I B B P。现在我们需要在显示 B 帧之前知道 P 帧中的信息。因此，帧可能会按照这样的方式来存储： IPBB。这就是为什么我们会有一个解码时间戳和一个显示时间戳的原因。解码时间戳告诉我们什么时候需要解码，显示时间戳告诉我们什么时候需要显示。所以，在这种情况下，我们的流可以是这样的：

PTS: 1 4 2 3   DTS: 1 2 3 4Stream: I P B B

Generally the PTS and DTS will only differ when the stream we are playing has B frames in it.
通常 PTS 和 DTS 只有在流中有 B 帧的时候会不同。

When we get a packet from av_read_frame(), it will contain the PTS and DTS values for the information inside that packet. But what we really want is the PTS of our newly decoded raw frame, so we know when to display it.Fortunately, FFMpeg supplies us with a “best effort” timestamp, which you can get via, av_frame_get_best_effort_timestamp().
当我们调用 av_read_frame() 得到一个包的时候， PTS 和 DTS 的信息也会保存在包中。但是我们真正想要的 PTS 是我们刚刚解码出来的原始帧的 PTS，这样我们才能知道什么时候来显示它。幸运的是，FFMpeg给我们提供了"最合适"的时间戳，可以通过av_frame_get_best_effort_timestamp()函数获得。

Synching 同步

Now, while it’s all well and good to know when we’re supposed to show a particular video frame, but how do we actually do so? Here’s the idea: after we show a frame, we figure out when the next frame should be shown. Then we simply set a new timeout to refresh the video again after that amount of time. As you might expect, we check the value of the PTS of the next frame against
the system clock to see how long our timeout should be. This approach works, but there are two issues that need to be dealt with.
真不错，现在知道了什么时候来显示一个视频帧，但是我们怎样来实际操作呢？这里有个主意：当我们显示了一帧以后，我们计算出下一帧显示的时间。然后我们简单的设置一个新的定时器用于在那个时间之后刷新视频。正如你可能猜到的，我们检查下一帧的 PTS 值而不是系统时钟来计算超时时长。这种方式可以工作，但是有两种情况要处理。

First is the issue of knowing when the next PTS will be. Now, you might think that we can just add the video rate to the current PTS – and you’d be mostly right. However, some kinds of video call for frames to be repeated. This means that we’re supposed to repeat the current frame a certain number of times. This could cause the program to display the next frame too soon. So we need to
account for that.
首先，要知道下一个 PTS 是什么时候。现在你应该能想到可以添加视频速率到我们的 PTS 中——对，很接近了！然而，有些视频需要帧重复。这意味着我们要重复播放当前的帧。这将导致程序显示下一帧太快了。所以我们需要处理它们。

The second issue is that as the program stands now, the video and the audio chugging away happily, not bothering to sync at all. We wouldn’t have to worry about that if everything worked perfectly. But your computer isn’t perfect, and a lot of video files aren’t, either. So we have three choices: sync the audio to the video, sync the video to the audio, or sync both to an external
clock (like your computer). For now, we’re going to sync the video to the audio.
第二，正如程序现在这样，视频和音频播放很欢快，一点也不受同步的影响。如果一切都工作得很好的话，我们不必担心。但是，你的电脑并不是最好的，很多视频文件也不是完好的。所以，我们有三种选择：同步音频到视频，同步视频到音频，或者都同步到外部时钟（例如你的电脑时钟）。从现在开始，我们将同步视频到音频。

Coding it: getting the frame PTS 编写代码：获得帧的时间戳

Now let’s get into the code to do all this. We’re going to need to add some more members to our big struct, but we’ll do this as we need to. First let’s look at our video thread. Remember, this is where we pick up the packets that were put on the queue by our decode thread. What we need to do in this part of the code is get the PTS of the frame given to us by avcodec_decode_video2.The first way we talked about was getting the DTS of the last packet processed, which is pretty easy:
现在让我们到代码中来做这些事情。我们将需要为我们的大结构体添加一些成员，但是我们会只加必要的。先看一下视频线程。记住，在这里我们得到了解码线程输出到队列中的包。这里我们需要的是通过avcodec_decode_video2函数来得到帧的时间戳。我们说的第一种方式是从上次处理的包中得到 DTS，这很容易：

double pts;          for(;;) {        if(packet_queue_get(&is-;>videoq, packet, 1) < 0) {          // means we quit getting packets          break;        }        pts = 0;        // Decode video frame        len1 = avcodec_decode_video2(is->video_st->codec,                                    pFrame, &frameFinished;, packet);        if(packet->dts != AV_NOPTS_VALUE) {          pts = av_frame_get_best_effort_timestamp(pFrame);        } else {          pts = 0;        }        pts *= av_q2d(is->video_st->time_base);

We set the PTS to 0 if we can’t figure out what it is.
如果我们得不到 PTS 就把它设置为 0。

Well, that was easy. A technical note: You may have noticed we’re using int64 for the PTS. This is because the PTS is stored as an integer. This value is a timestamp that corresponds to a measurement of time in that stream’s time_base unit. For example, if a stream has 24 frames per second, a PTS of 42 is going to indicate that the frame should go where the 42nd frame would be if there we had a frame every 1/24 of a second (certainly not necessarily true).
好，那是很容易的。技术提示：你可能已经注意到我们使用 int64 来表示 PTS。这是因为 PTS 是以整型来保存的。这个值是一个
时间戳相当于时间的度量，用来以流的 time_base 为单位进行时间度量。例如，如果一个流是 24 帧每秒，值为 42的 PTS 表示这一帧应该排在第 42 个帧的位置如果我们每秒有 24 帧（这里并不完全正确）。

We can convert this value to seconds by dividing by the framerate. The time_base value of the stream is going to be 1/framerate (for fixed-fps content), so to get the PTS in seconds, we multiply by the time_base.
我们可以通过除以帧率来把这个值转化为秒。流中的 time_base 值表示 1/framerate（对于固定帧率来说），所以得到了以秒为单位的 PTS，我们需要乘以 time_base。

Coding: Synching and using the PTS 编写代码：使用 PTS 来同步

So now we’ve got our PTS all set. Now we’ve got to take care of the two synchronization problems we talked about above. We’re going to define a function called synchronize_video that will update the PTS to be in sync with everything. This function will also finally deal with cases where we don’t get a PTS value for our frame. At the same time we need to keep track of when the next frame is expected so we can set our refresh rate properly. We can accomplish this by using an internal video_clock value which keeps track
of how much time has passed according to the video. We add this value to our big struct.
现在我们得到了 PTS。我们要注意前面讨论到的两个同步问题。我们将定义一个函数叫做 synchronize_video，它可以更新同步的 PTS。这个函数也能最终处理我们得不到 PTS 的情况。同时我们要知道下一帧的时间以便于正确设置刷新速率。我们可以使用内部的反映当前视频已经播放时间的时钟 video_clock 来完成这个功能。我们把这些值添加到大结构体中。

typedef struct VideoState {      double          video_clock; // pts of last decoded frame / predicted pts of next decoded frame

Here’s the synchronize_video function, which is pretty self-explanatory:
下面的是函数 synchronize_video，它可以很好的自我注释：

double synchronize_video(VideoState *is, AVFrame *src_frame, double pts) {          double frame_delay;          if(pts != 0) {        /* if we have pts, set video clock to it */        is->video_clock = pts;      } else {        /* if we aren't given a pts, set it to the clock */        pts = is->video_clock;      }      /* update the video clock */      frame_delay = av_q2d(is->video_st->codec->time_base);      /* if we are repeating a frame, adjust clock accordingly */      frame_delay += src_frame->repeat_pict * (frame_delay * 0.5);      is->video_clock += frame_delay;      return pts;    }

You’ll notice we account for repeated frames in this function, too.
你也许注意到了，我们也计算了重复的帧。

Now let’s get our proper PTS and queue up the frame using queue_picture, adding a new pts argument:
现在让我们得到正确的 PTS 并且使用 queue_picture 来队列化帧，添加一个新的时间戳参数 pts：

// Did we get a video frame?        if(frameFinished) {          pts = synchronize_video(is, pFrame, pts);          if(queue_picture(is, pFrame, pts) < 0) {    break;          }        }

The only thing that changes about queue_picture is that we save that pts value to the VideoPicture structure that we queue up. So we have to add a pts variable to the struct and add a line of code:
对于 queue_picture 来说唯一改变的事情就是我们把时间戳值 pts 保存到 VideoPicture 结构体中，我们我们必需添加一个时间戳变量到结构体中并且添加一行代码：

typedef struct VideoPicture {      ...      double pts;    }    int queue_picture(VideoState *is, AVFrame *pFrame, double pts) {      ... stuff ...      if(vp->bmp) {        ... convert picture ...        vp->pts = pts;        ... alert queue ...      }

So now we’ve got pictures lining up onto our picture queue with proper PTS values, so let’s take a look at our video refreshing function. You may recall from last time that we just faked it and put a refresh of 80ms. Well, now we’re going to find out how to actually figure it out.
现在我们的图像队列中的所有图像都有了正确的时间戳值，所以让我们看一下视频刷新函数。你会记得上次我们用 80ms 的刷新时间来应付它。那么，现在我们将会算出实际的值。

Our strategy is going to be to predict the time of the next PTS by simply measuring the time between the previous pts and this one. At the same time, we need to sync the video to the audio. We’re going to make an audio clock: an internal value that keeps track of what position the audio we’re playing is at. It’s like the digital readout on any mp3 player. Since we’re synching the video to the audio, the video thread uses this value to figure out if it’s too far ahead or too far behind.
我们的策略是通过简单计算前一帧和现在这一帧的时间戳来预测出下一个时间戳的时间。同时，我们需要同步视频到音频。我们将设置一个音频时钟（audio clock）；一个内部值记录了我们正在播放的音频的位置。就像从任意的 mp3 播放器中读出来的数字一样。既然我们把视频同步到音频，视频线程使用这个值来算出是否太快还是太慢。

We’ll get to the implementation later; for now let’s assume we have a get_audio_clock function that will give us the time on the audio clock. Once we have that value, though, what do we do if the video and audio are out of sync? It would silly to simply try and leap to the correct packet through seeking or something. Instead, we’re just going to adjust the value we’ve calculated for the next refresh: if the PTS is too far behind the audio time, we double our calculated delay. if the PTS is too far ahead of the audio time,we simply refresh as quickly as possible. Now that we have our adjusted refresh time, or delay, we’re going to compare that with our computer’s clock by keeping a running frame_timer. This frame timer will sum up all of our calculated delays while playing the movie. In other words, this frame_timer is what time it should be when we display the next frame. We simply add the new delay to the frame timer, compare it to the time on our computer’s clock, and use that value to schedule the next refresh. This might be a bit confusing, so study the code carefully:
我们将在后面来实现这些代码；现在我们假设我们已经有一个可以给我们音频时钟的函数 get_audio_clock。一旦我们有了这个值，我们在音频和视频失去同步的时候应该做些什么呢？简单而有点笨的办法是试着用跳过正确帧或者其它的方式来解决。作为一种替代的手段，我们会调整下次刷新的值；如果时间戳太落后于音频时间，我们加倍计算延迟。如果时间戳太领先于音频时间，我们将尽可能快的刷新。既然我们有了调整过的时间(或称为延迟)，我们通过追踪frame_timer 的值然后与系统时钟进行比较。frame_timer将会累加视频中每一帧的延时(延时只能大于等于零)。换句话说，这个 frame_timer 就是指我们什么时候来显示下一帧。我们简单的增加新延时到frame_timer，把它和电脑的系统时间进行比较，然后使用那个值来调度下一次刷新。这可能有点难以理解，所以请认真研究代码。
(不太明白这里"如果时间戳太落后于音频时间，我们加倍计算延迟, 如果时间戳太领先于音频时间，我们将尽可能快的刷新", 我的理解是相反的：如果视频时间戳落后于音频时间，说明视频播放慢了，需要尽快刷新，看源代码也是这样的逻辑)。

void video_refresh_timer(void *userdata) {          VideoState *is = (VideoState *)userdata;      VideoPicture *vp;      double actual_delay, delay, sync_threshold, ref_clock, diff;            if(is->video_st) {        if(is->pictq_size == 0) {          schedule_refresh(is, 1);        } else {          vp = &is-;>pictq[is->pictq_rindex];              delay = vp->pts - is->frame_last_pts; /* the pts from last time */          if(delay <= 0 || delay >= 1.0) {    /* if incorrect delay, use previous one */    delay = is->frame_last_delay;          }          /* save for next time */          is->frame_last_delay = delay;          is->frame_last_pts = vp->pts;              /* update delay to sync to audio */          ref_clock = get_audio_clock(is);          /*这里有个前提是：音频和视频是按着两条时间线分别在播放，理想情况下是同步的，PTS(转换过的标准时间，以秒为单位) 相差是极小的*/          diff = vp->pts - ref_clock; //与音频时间的相减， diff可能为正数(视频快了)，也可能为负数(视频慢了), 如果绝对值大于阀值，也就是失去同步了              /* Skip or repeat the frame. Take delay into account     FFPlay still doesn't "know if this is the best guess." */     /* sync_threshold作为开始同步的阀值，delay为两帧之间的显示间隔时间，AV_SYNC_THRESHOLD为最小同步阀值, 0.01秒即10毫秒*/          sync_threshold = (delay > AV_SYNC_THRESHOLD) ? delay : AV_SYNC_THRESHOLD;          if(fabs(diff) < AV_NOSYNC_THRESHOLD) {    if(diff <= -sync_threshold) { //diff为负数，视频慢了，下一帧立即刷新      delay = 0;    } else if(diff >= sync_threshold) { //diff为正数，视频快了，下一帧延迟刷新      delay = 2 * delay;    }          }           /*这里已经算出了delay值，为什么不直接把这个值传入定时器来刷新下一帧呢，而是引入了frame_timer变量？ 我的理解是，引入frame_timer实质是引入了 第三条时间线(本地时间)，上面说过， frame_timer 就是指我们什么时候来显示下一帧， 那在我们显示过程中，某些环节有没有迟缓，比如图像数据复制的 耗时，传输的耗时 ，进程/线程调度的耗时, 或者定时器本身的不精确等各种原因，所以delay值需要消去这些耗时，就需要与本地时间对比来校正 (一般来说本地时间被认为是精准无误的)，frame_timer(应该显示下一帧的时间)减去本地时间av_gettime(), 就是实际应该延迟的时间 actual_delay。   注意frame_timer的初始值就是av_gettime()。  那么frame_timer怎样体现是下一帧显示时间呢， 就是从第0帧开始，每一帧都会计算下一帧的delay,  然后不断累加， 累加值就是下一帧应该显示的时间， 因为通过音视频PTS计算出的delay是正确的， 不会受其他因素的影响 */           is->frame_timer += delay;          /* computer the REAL delay */          actual_delay = is->frame_timer - (av_gettime() / 1000000.0);          if(actual_delay < 0.010) {    /* Really it should skip the picture instead */    actual_delay = 0.010;          }          schedule_refresh(is, (int)(actual_delay * 1000 + 0.5));          /* show the picture! */          video_display(is);                    /* update queue for next picture! */          if(++is->pictq_rindex == VIDEO_PICTURE_QUEUE_SIZE) {    is->pictq_rindex = 0;          }          SDL_LockMutex(is->pictq_mutex);          is->pictq_size--;          SDL_CondSignal(is->pictq_cond);          SDL_UnlockMutex(is->pictq_mutex);        }      } else {        schedule_refresh(is, 100);      }    }

There are a few checks we make: first, we make sure that the delay between the PTS and the previous PTS make sense. If it doesn’t we just guess and use the last delay. Next, we make sure we have a synch threshold because things are never going to be perfectly in synch. ffplay uses 0.01 for its value. We also make sure that the synch threshold is never smaller than the gaps in between PTS values. Finally, we make the minimum refresh value 10 milliseconds*.***** Really here we should skip the frame, but we’re not going to bother.
我们在这里做了很多检查：首先，我们保证现在的时间戳和上一个时间戳之间的处以 delay 是有意义的。如果不是的话，我们就猜测着用上次的延迟。接着，我们有一个同步阈值，因为在同步的时候事情并不总是那么完美的。在 ffplay 中使用 0.01 作为它的值。我们也保证阈值不会比时间戳之间的间隔短。最后，我们把最小的刷新值设置为 10 毫秒。
注：事实上这里我们应该跳过这一帧，但是我们不想为此而烦恼。

We added a bunch of variables to the big struct so don’t forget to check the code. Also, don’t forget to initialize the frame timer and the initial previous frame delay in stream_component_open:
我们给大结构体添加了很多的变量，所以不要忘记检查一下代码。同时也不要忘记在函数 streame_component_open 中初始化帧时间 frame_timer 和前面的帧延迟 frame delay：

is->frame_timer = (double)av_gettime() / 1000000.0;        is->frame_last_delay = 40e-3; // 40毫秒

Synching: The Audio Clock 同步：音频时钟

Now it’s time for us to implement the audio clock. We can update the clock time in our audio_decode_frame function, which is where we decode the audio. Now, remember that we don’t always process a new packet every time we call this function, so there are two places we have to update the clock at. The first place is where we get the new packet: we simply set the audio clock to the packet’s PTS. Then if a packet has multiple frames, we keep time the audio play by counting the number of samples and multiplying them by the given samples-per-second rate. So once we have the packet:
现在让我们看一下怎样得到音频时钟。我们可以在音频解码函数 audio_decode_frame 中更新时钟时间。现在，请记住我们并不是每次调用这个函数的时候都在处理新的包，所以有我们要在两个地方更新时钟。第一个地方是我们得到新的包的时候：我们简单的设置音频时钟为这个包的时间戳。然后，如果一个包里有许多帧，我们通过采样数量和采样率来计算，所以当我们得到包的时候：

/* if update, update the audio clock w/pts */        if(pkt->pts != AV_NOPTS_VALUE) {          is->audio_clock = av_q2d(is->audio_st->time_base)*pkt->pts;        }

And once we are processing the packet:
然后当我们处理这个包的时候：

/* Keep audio_clock up-to-date */          pts = is->audio_clock;          *pts_ptr = pts;          n = 2 * is->audio_st->codec->channels;          is->audio_clock += (double)data_size /    (double)(n * is->audio_st->codec->sample_rate);

A few fine details: the template of the function has changed to include pts_ptr, so make sure you change that. pts_ptr is a pointer we use to inform audio_callback the pts of the audio packet. This will be used next time for synchronizing the audio with the video.
一点细节：临时函数被改成包含 pts_ptr，所以要保证你已经改了那些。这时的 pts_ptr 是一个用来通知audio_callback 函数当前音频包的时间戳的指针。这将在下次用来同步音频和视频。

Now we can finally implement our get_audio_clock function. It’s not as simple as getting the is->audio_clock value, thought. Notice that we set the audio PTS every time we process it, but if you look at the audio_callback function, it takes time to move all the data from our audio packet into our output buffer. That means that the value in our audio clock could be too far ahead. So we have to check how much we have left to write. Here’s the complete code:
现在我们可以最后来实现我们的 get_audio_clock 函数。它并不像得到 is->audio_clock 值那样简单。注意我们会在每次处理它的时候设置音频时间戳，但是如果你看了 audio_callback 函数，它花费了时间来把数据从音频包中移到我们的输出缓冲区中，这意味着我们音频时钟中记录的时间比实际的要早太多。所以我们必须要检查一下我们还有多少没有写入。下面是完整的代码：

double get_audio_clock(VideoState *is) {      double pts;      int hw_buf_size, bytes_per_sec, n;            pts = is->audio_clock; /* maintained in the audio thread */      hw_buf_size = is->audio_buf_size - is->audio_buf_index;      bytes_per_sec = 0;      n = is->audio_st->codec->channels * 2;      if(is->audio_st) {        bytes_per_sec = is->audio_st->codec->sample_rate * n;      }      if(bytes_per_sec) {        pts -= (double)hw_buf_size / bytes_per_sec;      }      return pts;    }

You should be able to tell why this function works by now.
So that’s it! Go ahead and compile it:
你应该知道为什么这个函数可以正常工作了.
这就是了！让我们编译它：

g++ -std=c++14 -o tuturial07 tutorial07.cpp -I/INCLUDE_PATH -L/LIB_PATH -lavutil -lavformat -lavcodec -lswscale -lswresample -lavdevice -lz -lavutil -lm -lpthread -ldl

and finally! you can watch a movie on your own movie player. Next time we’ll look at audio synching, and then the tutorial after that we’ll talk about seeking.
最后，你可以使用我们自己的电影播放器来看电影了。下次我们将看一下音频同步，然后接下来的教程我们会讨论拖动/快进（seeking）。

源码tutorial07.cpp