VP9 is a next-generation video compression format developed by the WebM Project .
The reference software implementation for VP9 encoding and decoding can be found in the libvpx
library. Similar to AV1, significant effort has been made to optimize the libvpx
implementation on Arm platforms. This includes making use of the Neon and SVE2 vector extensions available on Arm Neoverse platforms. You can learn more on
Google for Developers
.
Download the libvpx
source code:
git clone https://chromium.googlesource.com/webm/libvpx
cd libvpx
The configuration allows for different options to be enabled and disabled.
To see the options run:
./configure --help
Configure and build with VP8 disabled, VP9 enabled, and unit tests enabled:
./configure --disable-vp8 --enable-vp9 --enable-vp9-highbitdepth --enable-unit-tests
make -j$(nproc)
For more information, refer to the README .
You can run libvpx
unit tests using make
. For example, to run the decoder test run:
make testdata
make test TEST_NAME=vp9_decoder_test
The output is:
Note: Google Test filter = *-:NEON_I8MM.*:NEON_I8MM/*:SVE.*:SVE/*:SVE2.*:SVE2/*
Note: This is test shard 1 of 10.
[==========] Running 1592 tests from 132 test suites.
[----------] Global test environment set-up.
[----------] 1 test from ByteAlignmentTest
[ RUN ] ByteAlignmentTest.SwitchByteAlignment
[ OK ] ByteAlignmentTest.SwitchByteAlignment (5248 ms)
[----------] 1 test from ByteAlignmentTest (5248 ms total)
[----------] 4 tests from EncodeAPI
[ RUN ] EncodeAPI.SetRoi
[ OK ] EncodeAPI.SetRoi (4 ms)
[ RUN ] EncodeAPI.DynamicDeadlineChange
[ OK ] EncodeAPI.DynamicDeadlineChange (4 ms)
[ RUN ] EncodeAPI.Buganizer311294795
[ OK ] EncodeAPI.Buganizer311294795 (212 ms)
[ RUN ] EncodeAPI.Buganizer329179808RowMT1BitDepth10
[ OK ] EncodeAPI.Buganizer329179808RowMT1BitDepth10 (26 ms)
[----------] 4 tests from EncodeAPI (248 ms total)
< output omitted >
You can benchmark either the encoding or decoding processes. In this section, you will focus on encoding.
For Performance benchmarking, you can select video encoding either on-demand or live stream.
The video files are large so adjust the paths to the y4m files in the commands below. You can also copy them to the current directory.
For on-demand encoding, you can experiment different number of processors and monitor performance.
For example, run with --good
and use the --cpu-used
argument to vary the number from 2 to 6.
The naming of --cpu-used
flag is to trade-off encoding speed for resulting video quality/compression, not to determine how many CPUs to use for parallel encoding. Lower numbers indicate better quality but longer encoding time. For VoD a reasonable range of quality settings is 0-4 inclusive.
Run standard bit depth and change the CPU count, then see the results using:
./vpxenc --good --cpu-used=2 --profile=0 --bit-depth=8 -o output.mkv Bosphorus_1920x1080_120fps_420_8bit_YUV.y4m
Try the above command with different --cpu-used
values:
./vpxenc --good --cpu-used=6 --profile=0 --bit-depth=8 -o output.mkv Bosphorus_1920x1080_120fps_420_8bit_YUV.y4m
You can do the same for high bit depth:
./vpxenc --good --cpu-used=2 --profile=2 --bit-depth=10 -o output.mkv Bosphorus_3840x2160_120fps_420_10bit_YUV.y4m
For live stream encoding, you can experiment different number of processors and monitor performance using the --cpus-used
in the range from 5 to 9.
The naming of --cpu-used
flag is to trade-off encoding speed for resulting video quality/compression, not to determine how many CPUs to use for parallel encoding. Lower numbers indicate better quality but longer encoding time. For live stream a reasonable range of quality settings is 5-9.
For standard bit depth with 8 CPUs, run:
./vpxenc --rt --cpu-used=8 --profile=0 --bit-depth=8 -o output.mkv Bosphorus_1920x1080_120fps_420_8bit_YUV.y4m
For high bit depth, run:
./vpxenc --rt --cpu-used=8 --profile=2 --bit-depth=10 -o output.mkv Bosphorus_3840x2160_120fps_420_10bit_YUV.y4m
The encoding frame rate (frames per second) for the video files is output at the end of each run.
Shown below is the example output from running the libvpx
codec on the 8-bit FHD sample video file with different –cpu-used settings to compare the encoding time.
The output from running the codec with --cpu-used=2
is similar to:
–cpu-used=2
Pass 1/2 frame 600/601 129816B 1730b/f 51926b/s 29486 ms (20.35 fps)
Pass 2/2 frame 600/576 625752B 135925 ms 4.41 fps [ETA 0:00:05] 196F 260F 220F 227F 242F 166F 66F 4066F 316F 386F 337F 24Pass 2/2 frame 600/600 639729B 8529b/f 255891b/s 109956 ms (5.46 fps)
The output from running the codec with --cpu-used=6
is similar to:
Pass 1/2 frame 600/601 129816B 1730b/f 51926b/s 29446 ms (20.38 fps)
Pass 2/2 frame 600/576 629484B 101687 ms 5.90 fps [ETA 0:00:04] 344F 249F 215F 206F 195F 168F 70F 5095F 186F 230F 206F 21Pass 2/2 frame 600/600 643500B 8580b/f 257400b/s 74431 ms (8.06 fps)
As demonstrated in these examples, --cpu-used=6
is 30% faster than --cpu-used=2
.