Skip to content

Commit df840cd

Browse files
add remux chapter
1 parent 5e7e141 commit df840cd

5 files changed

Lines changed: 197 additions & 1 deletion

File tree

README.md

Lines changed: 197 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -500,4 +500,200 @@ LOG: Frame 5 (type=B, size=6253 bytes) pts 10000 key_frame 0 [DTS 5]
500500
LOG: Frame 6 (type=P, size=34992 bytes) pts 11000 key_frame 0 [DTS 1]
501501
```
502502
503-
## Chapter 2 - transcoding
503+
## Chapter 2 - remuxing
504+
505+
Remuxnig is the act of changing from one format (container) to another, for instance, we can change an MP4 video to a [MPEG-TS](https://en.wikipedia.org/wiki/MPEG_transport_stream) without much pain using FFmpeg:
506+
507+
```bash
508+
ffmpeg input.mp4 -c copy output.ts
509+
```
510+
511+
It'll demux the mp4 but it won't decode or encode it (`-c copy`) and in the end, it'll mux it into a `mpegts` file. If you don't provide the format `-f` the ffmpeg will try to guess it based on the file's extension.
512+
513+
The general usage of FFmpeg or the libav follows a pattern/architecture or workflow:
514+
* **[protocol layer](https://ffmpeg.org/doxygen/trunk/protocols_8c.html)** - it accepts an `input` (a `file` for instance but it could be a `rtmp` or `HTTP` input as well)
515+
* **[format layer](https://ffmpeg.org/doxygen/trunk/group__libavf.html)** - it `demuxes` its content, revealing mostly metadata and its streams
516+
* **[codec layer](https://ffmpeg.org/doxygen/trunk/group__libavc.html)** - it `decodes` its compressed streams data <sup>*optional*</sup>
517+
* **[pixel layer](https://ffmpeg.org/doxygen/trunk/group__lavfi.html)** - it can also apply some `filters` to the raw frames (like resizing)<sup>*optional*</sup>
518+
* and then it does the reverse path
519+
* **[codec layer](https://ffmpeg.org/doxygen/trunk/group__libavc.html)** - it `encodes` (or `re-encodes` or even `transcodes`) the raw frames<sup>*optional*</sup>
520+
* **[format layer](https://ffmpeg.org/doxygen/trunk/group__libavf.html)** - it `muxes` (or `remuxes`) the raw streams (the compressed data)
521+
* **[protocol layer](https://ffmpeg.org/doxygen/trunk/protocols_8c.html)** - and finally the muxed data is sent to an `output` (another file or maybe a network remote server)
522+
523+
![ffmpeg libav workflow](/img/ffmpeg_libav_workflow.jpeg)
524+
> This graph is strongly inspired by [Leixiaohua's](http://leixiaohua1020.github.io/#ffmpeg-development-examples) and [Slhck's](https://slhck.info/ffmpeg-encoding-course/#/9) works.
525+
526+
Now let's code an example using libav to provide the same effect as in `ffmpeg input.mp4 -c copy output.ts`.
527+
528+
We're going to read from an input (`input_format_context`) and change it to another output (`output_format_context`).
529+
530+
```c
531+
AVFormatContext *input_format_context = NULL;
532+
AVFormatContext *output_format_context = NULL;
533+
```
534+
535+
We start doing the usually allocate memory and open the input format. For this specific case, we're going to open an input file and allocate memory for an output file.
536+
537+
```c
538+
if ((ret = avformat_open_input(&input_format_context, in_filename, NULL, NULL)) < 0) {
539+
fprintf(stderr, "Could not open input file '%s'", in_filename);
540+
goto end;
541+
}
542+
if ((ret = avformat_find_stream_info(input_format_context, NULL)) < 0) {
543+
fprintf(stderr, "Failed to retrieve input stream information");
544+
goto end;
545+
}
546+
547+
avformat_alloc_output_context2(&output_format_context, NULL, NULL, out_filename);
548+
if (!output_format_context) {
549+
fprintf(stderr, "Could not create output context\n");
550+
ret = AVERROR_UNKNOWN;
551+
goto end;
552+
}
553+
```
554+
555+
We're going to remux only the video, audio and subtitle types of streams so we're holding what streams we'll be using into an array of indexes.
556+
557+
```c
558+
number_of_streams = input_format_context->nb_streams;
559+
streams_list = av_mallocz_array(number_of_streams, sizeof(*streams_list));
560+
```
561+
562+
Just after we allocated the required memory, we're going to loop throughout all the streams and for each one we need to create new out stream into our output format context, using the [avformat_new_stream](https://ffmpeg.org/doxygen/trunk/group__lavf__core.html#gadcb0fd3e507d9b58fe78f61f8ad39827) function. Notice that we're marking all the streams that aren't video, audio or subtitle so we can skip them after.
563+
564+
```c
565+
for (i = 0; i < input_format_context->nb_streams; i++) {
566+
AVStream *out_stream;
567+
AVStream *in_stream = input_format_context->streams[i];
568+
AVCodecParameters *in_codecpar = in_stream->codecpar;
569+
if (in_codecpar->codec_type != AVMEDIA_TYPE_AUDIO &&
570+
in_codecpar->codec_type != AVMEDIA_TYPE_VIDEO &&
571+
in_codecpar->codec_type != AVMEDIA_TYPE_SUBTITLE) {
572+
streams_list[i] = -1;
573+
continue;
574+
}
575+
streams_list[i] = stream_index++;
576+
out_stream = avformat_new_stream(output_format_context, NULL);
577+
if (!out_stream) {
578+
fprintf(stderr, "Failed allocating output stream\n");
579+
ret = AVERROR_UNKNOWN;
580+
goto end;
581+
}
582+
ret = avcodec_parameters_copy(out_stream->codecpar, in_codecpar);
583+
if (ret < 0) {
584+
fprintf(stderr, "Failed to copy codec parameters\n");
585+
goto end;
586+
}
587+
}
588+
```
589+
590+
Now we can create the output file.
591+
592+
```c
593+
if (!(output_format_context->oformat->flags & AVFMT_NOFILE)) {
594+
ret = avio_open(&output_format_context->pb, out_filename, AVIO_FLAG_WRITE);
595+
if (ret < 0) {
596+
fprintf(stderr, "Could not open output file '%s'", out_filename);
597+
goto end;
598+
}
599+
}
600+
601+
ret = avformat_write_header(output_format_context, NULL);
602+
if (ret < 0) {
603+
fprintf(stderr, "Error occurred when opening output file\n");
604+
goto end;
605+
}
606+
```
607+
608+
After that, we can copy the streams, packet by packet, from our input to our output streams. We'll loop while it has packets (`av_read_frame`), for each packet we need to re-calculate the PTS and DTS to finally write it (`av_interleaved_write_frame`) to our output format context.
609+
610+
```c
611+
while (1) {
612+
AVStream *in_stream, *out_stream;
613+
ret = av_read_frame(input_format_context, &packet);
614+
if (ret < 0)
615+
break;
616+
in_stream = input_format_context->streams[packet.stream_index];
617+
if (packet.stream_index >= number_of_streams || streams_list[packet.stream_index] < 0) {
618+
av_packet_unref(&packet);
619+
continue;
620+
}
621+
packet.stream_index = streams_list[packet.stream_index];
622+
out_stream = output_format_context->streams[packet.stream_index];
623+
/* copy packet */
624+
packet.pts = av_rescale_q_rnd(packet.pts, in_stream->time_base, out_stream->time_base, AV_ROUND_NEAR_INF|AV_ROUND_PASS_MINMAX);
625+
packet.dts = av_rescale_q_rnd(packet.dts, in_stream->time_base, out_stream->time_base, AV_ROUND_NEAR_INF|AV_ROUND_PASS_MINMAX);
626+
packet.duration = av_rescale_q(packet.duration, in_stream->time_base, out_stream->time_base);
627+
// https://ffmpeg.org/doxygen/trunk/structAVPacket.html#ab5793d8195cf4789dfb3913b7a693903
628+
packet.pos = -1;
629+
630+
//https://ffmpeg.org/doxygen/trunk/group__lavf__encoding.html#ga37352ed2c63493c38219d935e71db6c1
631+
ret = av_interleaved_write_frame(output_format_context, &packet);
632+
if (ret < 0) {
633+
fprintf(stderr, "Error muxing packet\n");
634+
break;
635+
}
636+
av_packet_unref(&packet);
637+
}
638+
```
639+
640+
To finalize we need to write the stream trailer to an output media file with [av_write_trailer](https://ffmpeg.org/doxygen/trunk/group__lavf__encoding.html#ga7f14007e7dc8f481f054b21614dfec13) function.
641+
642+
```c
643+
av_write_trailer(output_format_context);
644+
```
645+
646+
Now we're ready to test it and the first test will be a format (video container) conversion from a MP4 to a MPEG-TS video file. We're basically making the command line `ffmpeg input.mp4 -c copy output.ts` with libav.
647+
648+
```bash
649+
make run_remuxing_ts
650+
```
651+
652+
It's working!!! don't you trust me?! you shouldn't, we can check it with `ffprobe`:
653+
654+
```bash
655+
ffprobe -i remuxed_small_bunny_1080p_60fps.ts
656+
657+
Input #0, mpegts, from 'remuxed_small_bunny_1080p_60fps.ts':
658+
Duration: 00:00:10.03, start: 0.000000, bitrate: 2751 kb/s
659+
Program 1
660+
Metadata:
661+
service_name : Service01
662+
service_provider: FFmpeg
663+
Stream #0:0[0x100]: Video: h264 (High) ([27][0][0][0] / 0x001B), yuv420p(progressive), 1920x1080 [SAR 1:1 DAR 16:9], 60 fps, 60 tbr, 90k tbn, 120 tbc
664+
Stream #0:1[0x101]: Audio: ac3 ([129][0][0][0] / 0x0081), 48000 Hz, 5.1(side), fltp, 320 kb/s
665+
```
666+
667+
To sum up what we did here in a graph, we can revisit our initial [idea about how libav works](https://github.com/leandromoreira/ffmpeg-libav-tutorial#ffmpeg-libav-architecture) but showing that we skipped the codec part.
668+
669+
![remuxing libav components](/img/remuxing_libav_components.png)
670+
671+
Before we end this chapter I'd like to show an important part of the remuxing process, **you can pass options to the muxer**. Let's say we want to delivery [MPEG-DASH](https://developer.mozilla.org/en-US/docs/Web/Apps/Fundamentals/Audio_and_video_delivery/Setting_up_adaptive_streaming_media_sources#MPEG-DASH_Encoding) format for that matter we need to use [fragmented mp4](https://stackoverflow.com/a/35180327) (sometimes referred as `fmp4`) instead of MPEG-TS or plain MPEG-4.
672+
673+
With the [command line we can do that easily](https://developer.mozilla.org/en-US/docs/Web/API/Media_Source_Extensions_API/Transcoding_assets_for_MSE#Fragmenting).
674+
675+
```
676+
ffmpeg -i non_fragmented.mp4 -movflags frag_keyframe+empty_moov+default_base_moof fragmented.mp4
677+
```
678+
679+
Almost equally easy as the command line is the libav version of it, we just need to pass the options when write the output header, just before the packets copy.
680+
681+
```c
682+
AVDictionary* opts = NULL;
683+
av_dict_set(&opts, "movflags", "frag_keyframe+empty_moov+default_base_moof", 0);
684+
ret = avformat_write_header(output_format_context, &opts);
685+
```
686+
687+
We now can generate this fragmented mp4 file:
688+
689+
```bash
690+
make run_remuxing_fragmented_mp4
691+
```
692+
693+
But to make sure that I'm not lying to you. You can use the amazing site/tool [gpac/mp4box.js](http://download.tsi.telecom-paristech.fr/gpac/mp4box.js/filereader.html) or the site [http://mp4parser.com/](http://mp4parser.com/) to see the differences, first load up the "common" mp4.
694+
695+
![mp4 boxes](/img/boxes_normal_mp4.png)
696+
697+
As you can see it has a single `mdat` atom/box, **this is place where the video and audio frames are**. Now load the fragmented mp4 to see which how it spreads the `mdat` boxes.
698+
699+
![](/img/boxes_fragmente_mp4.png)

img/boxes_fragmente_mp4.png

17.8 KB
Loading

img/boxes_normal_mp4.png

11.9 KB
Loading

img/ffmpeg_libav_workflow.jpeg

60.7 KB
Loading

img/remuxing_libav_components.png

179 KB
Loading

0 commit comments

Comments
 (0)