Friday, June 15, 2018

flac on steroids - Part 2

Basically as a fallout of my earlier flac benchmarking exercise I looked into the performance of the flac binary related to certain compression levels. 


If highest efficiency is the goal, for sure we need to look at that topic as well.

It's been discussed out there that the compression levels do have an impact on the performance. 

Usually you'll find the "opinions" out there "it's fast enough - forget about it" or "differences can be neglected".  Even the designers try to play it down - you'll find the link to the discussion further down!

Let's find out what we're talking about here. 



flac offers compression levels (CL) 0 to 8.  
On top of that certain options can be applied generating a 10th format a "Non-Compressed" flac audio file.

If you'd just look at the resulting file sizes, the obvious reason for having different CLs, 
I'd say leave this exercise alone - it's not worth it. 

However. I havn't seen any properly executed benchmarking exercise looking at the performance. 

If we want to get the picture about efficiency straight, we'd better look into it.

Let's start looking at a very special CL first.
Aah. Non-Compressed (NC) flac !?!? What's that !?!?

OK. The original idea floating around NC flacs is that people were looking for a .wav file wrapped in a flac container with all its tagging features. An interesting idea. 
There are others "the audiophiles" (yep - that also includes me - I'd consider myself a techie and an audiophile) - who claim to hear differences between CLs and also wavs. Typically a .wav is said to be the preferred format.

dbPoweramp, is the only (GUI based) tool I'm aware of, that offers the Non-Compressed option.  

You can achieve the same result by using the flac binary with following options:

"--compression-level-0 --disable-constant-subframes --disable-fixed-subframes"


By default, the flac binary applies CL5. You'd have to intervene manually to get your CL of choice.

Keep in mind! Once the files are encoded you can't figure out the compression level being used for encoding it anymore! 
You'd need to reencode a file (or collection) to a certain CL to make sure to know what you've got in front of you. There are batch-converters out there doing it for you. Try your favorite album first - and don't overwrite the originals!.


Let's get the benchmarking work done.


For the tests I've been using the earlier discussed CRC optimized flac made from git sources.


First I generated several flacs from my  44.1/16bit  test16.wav.
flac -f --compression-level-0 -o test16-cl0.flac test16.wav
flac -f --compression-level-5 -o test16-cl5.flac test16.wav
flac -f --compression-level-0 --disable-constant-subframes --disable-fixed-subframes -o test16-nocomp-cl0.flac test16.wav
flac -f -l 0 --disable-constant-subframes --disable-fixed-subframes -o test16-nocomp-l0.flac test16.wav
flac -f -0 --disable-constant-subframes --disable-fixed-subframes -o test16-nocomp-0.flac test16.wav
(same filesize for all 3 above!)
Filesizes:
test16.wav = 103887884

test16-cl0.flac = 41017617
test16-cl5.flac = 39387134
test16-nocomp-l0.flac = 104165565
As expected the NC file slightly exceeds the original  .wav size.
The file size between C0 and C5 can (IMO) almost be neglected.
I also generated three different NC files to prove that using different options generate the
same file. (As result of a discussion I had with a flac designer)

I then executed the performance testing. I ran each of the tests several times.
And the tool itself ran 10 loops.

I used the new CRC optimzed flac build from git sources with gcc opts set to "-O3 -march=broadwell",  avx2 and nasm in place.
Here's the procedure:

########################################################################
BIN=/tmp/flac-git-opt
echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
sleep 2
for i in test16-cl0.flac test16-cl5.flac test16-nocomp-cl0.flac ; do
echo "************************"
echo "
$i"
perf stat -r 10 -B $BIN --totally-silent -f -d /tmp/$i
sleep 3
sync
echo
done
########################################################################


And here are the results:

****test16-cl0.flac
Performance counter stats for '/tmp/flac-git-opt --totally-silent -f -d /tmp/test16-cl0.flac' (10 runs):
    620,669145      task-clock (msec)         #    0,999 CPUs utilized            ( +-  0,36% )
             4      context-switches          #    0,007 K/sec                    ( +- 12,77% )
             0      cpu-migrations            #    0,000 K/sec                  
           102      page-faults               #    0,164 K/sec                    ( +-  0,31% )
 1.659.587.593      cycles                    #    2,674 GHz                      ( +-  0,21% )
 3.544.392.275      instructions              #    2,14  insn per cycle           ( +-  0,00% )
   265.420.089      branches                  #  427,635 M/sec                    ( +-  0,01% )
     7.799.719      branch-misses             #    2,94% of all branches          ( +-  0,12% )

   0,620986301 seconds time elapsed                                          ( +-  0,37% )

****test16-cl5.flac
Performance counter stats for '/tmp/flac-git-opt --totally-silent -f -d /tmp/test16-cl5.flac' (10 runs):
    702,953210      task-clock (msec)         #    1,000 CPUs utilized            ( +-  0,28% )
             5      context-switches          #    0,007 K/sec                    ( +- 14,89% )
             0      cpu-migrations            #    0,000 K/sec                    ( +- 50,92% )
           120      page-faults               #    0,171 K/sec                    ( +-  0,33% )
 1.879.136.299      cycles                    #    2,673 GHz                      ( +-  0,15% )
 4.430.605.638      instructions              #    2,36  insn per cycle           ( +-  0,00% )
   264.470.255      branches                  #  376,227 M/sec                    ( +-  0,00% )
     7.447.174      branch-misses             #    2,82% of all branches          ( +-  0,07% )

   0,703254931 seconds time elapsed                                          ( +-  0,28% )

****test16-nocomp-cl0.flac
Performance counter stats for '/tmp/flac-git-opt --totally-silent -f -d /tmp/test16-nocomp-cl0.flac' (10 runs):
    993,153306      task-clock (msec)         #    1,000 CPUs utilized            ( +-  0,27% )
             4      context-switches          #    0,005 K/sec                    ( +- 12,06% )
             0      cpu-migrations            #    0,000 K/sec                    ( +- 55,28% )
           102      page-faults               #    0,103 K/sec                    ( +-  0,32% )
 2.658.086.321      cycles                    #    2,676 GHz                      ( +-  0,25% )
 7.457.070.868      instructions              #    2,81  insn per cycle           ( +-  0,00% )
   920.078.916      branches                  #  926,422 M/sec                    ( +-  0,00% )
     1.298.655      branch-misses             #    0,14% of all branches          ( +-  0,87% )

   0,993540048 seconds time elapsed                                          ( +-  0,27% )
Result summary:
CL0=0.620986301
CL5=0.703254931 +13.2%
CLN=0.993540048 +60%
Wow. That's a surprise.
+60% on the Non-Compressed flac. I'd expected it to be faster than CL0!?!?
I then learned from the flac designer that flac is still running several tasks of the  "decode" process...  

...now on a much larger No-Compression file. 

That for sure can make the difference. 

And that also means:
A Non-Compressed flac doesn't equal a .wav file from its data structure! 
An NC flac still needs to get processed!
And IMO that's pretty much killing the "Non-Compression" case...

Another conclusion: 

Using CL5 is 13.2% slower than CL0 on the decode side.
If many of us appreciate a >5% performance increase by a new CRC algorithm,
choosing the right compression level will have a more then relevant impact on the overall decoding performance on top of that.


Of course a .wav beats them all. There's no compression and conversion.


Wrap-Up

What are the learnings of Part 1 and Part 2 of the "flac on steriods" mini series ?

If you look for best performance and highest efficiency you need to have a look at the binary AND the data - the compression level in particular. 

In my shown real-world scenarios the performance gain adds up to more then 25% - for binary optimizations plus compression level optimization. 

That's not bad, not bad at all I'd say.

You can't rely on your distribution or SW package (e.g. LMS/sox) to provide you with highest performance SW. You'd need to compile it yourself for your own platform. That's e.g. the whole idea behind the Gentoo operating system - the entire system gets compiled for your specific architecture. 

This exercise also confirms to me that I've been on the right track all along with my own stuff by using CL0 flacs. 

And also doing the right thing by decoding flacs (and DSP work)  on my Intel NUC server instead of the RPI doesn't seem to be the worst idea. 

This exercise also shows that .wav from an efficiency perspective is still the preferable format. However. If you'd add the "streaming-load" effect to the equation
.wav with more then double the size in comparison does add some extra load on the setup.
The main issue with .wav are compatibility and tagging - thus control issues. Nope. I don't want to miss all the tag info in my control app.  

Here we are again. As usual you need go for the best compromise and that compromise might even look different from system to system.

Since I decode my flacs on the server prior to playback, stream them as PCM - the actual .wav data format - and bulk-store these into a local RAM-buffer, there's no need for having  .wav files. I can still enjoy all the advantages brought to us by flac... 

...and here it comes...  ...all that while not experiencing any impact on perceived sound quality - compared to the .wav reference file format! A no compromise solution! ;)


Enjoy.


1 comment:

  1. Hi
    An interesting case is to benchmark ffmpeg vs native flac decompression with console silent options. Interestingly I found ffmpeg (both 3.x and 4.0) faster than flac (v1.3.2) by some significant margin. I wonder if another one can replicate results. It would be nice to compare both compression and decompression also with flac 1.3.2 and latest development with ffmpeg versions.

    ReplyDelete