Friday, June 15, 2018

flac on steroids - Part 2

Basically as a fallout of my earlier flac benchmarking exercise I looked into the performance of the flac binary related to certain compression levels. 


If highest efficiency is the goal, for sure we need to look at that topic as well.

It's been discussed out there that the compression levels do have an impact on the performance. 

Usually you'll find the "opinions" out there "it's fast enough - forget about it" or "differences can be neglected".  Even the designers try to play it down - you'll find the link to the discussion further down!

Let's find out what we're talking about here. 


flac offers compression levels (CL) 0 to 8.  
On top of that certain options can be applied generating a 10th format a "Non-Compressed" flac audio file.

If you'd just look at the resulting file sizes, the obvious reason for having different CLs, 
I'd say leave this exercise alone - it's not worth it. 

However. I havn't seen any properly executed benchmarking exercise looking at the performance. 

If we want to get the picture about efficiency straight, we'd better look into it.

Let's start looking at a very special CL first.
Aah. Non-Compressed (NC) flac !?!? What's that !?!?

OK. The original idea floating around NC flacs is that people were looking for a .wav file wrapped in a flac container with all its tagging features. An interesting idea. 
There are others "the audiophiles" (yep - that also includes me - I'd consider myself a techie and an audiophile) - who claim to hear differences between CLs and also wavs. Typically a .wav is said to be the preferred format.

dbPoweramp, is the only (GUI based) tool I'm aware of, that offers the Non-Compressed option.  

You can achieve the same result by using the flac binary with following options:

"--compression-level-0 --disable-constant-subframes --disable-fixed-subframes"


By default, the flac binary applies CL5. You'd have to intervene manually to get your CL of choice.

Keep in mind! Once the files are encoded you can't figure out the compression level being used for encoding it! 
You'd need to reencode a file (or collection) to a certain CL to make sure to know what you've got in front of you. There are batch-converters out there doing it for you. Try your favorite album first - and don't overwrite the originals!.


Let's get the benchmarking work done.


For the tests I've been using the earlier discussed CRC optimized flac made from git sources.


First I generated several flacs from my  44.1/16bit  test16.wav.
flac -f --compression-level-0 -o test16-cl0.flac test16.wav
flac -f --compression-level-5 -o test16-cl5.flac test16.wav
flac -f --compression-level-0 --disable-constant-subframes --disable-fixed-subframes -o test16-nocomp-cl0.flac test16.wav
flac -f -l 0 --disable-constant-subframes --disable-fixed-subframes -o test16-nocomp-l0.flac test16.wav
flac -f -0 --disable-constant-subframes --disable-fixed-subframes -o test16-nocomp-0.flac test16.wav
(same filesize for all 3 above!)
Filesizes:
test16.wav = 103887884

test16-cl0.flac = 41017617
test16-cl5.flac = 39387134
test16-nocomp-l0.flac = 104165565
As expected the NC file slightly exceeds the original  .wav size.
The file size between C0 and C5 can (IMO) almost be neglected.
I also generated three different NC files to prove that using different options generate the
same file. (As result of a discussion I had with a flac designer)

I then executed the performance testing. I ran each of the tests several times.
And the tool itself ran 10 loops.

I used the new CRC optimzed flac build from git sources with gcc opts set to "-O3 -march=broadwell",  avx2 and nasm in place.
Here's the procedure:

########################################################################
BIN=/tmp/flac-git-opt
echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
sleep 2
for i in test16-cl0.flac test16-cl5.flac test16-nocomp-cl0.flac ; do
echo "************************"
echo "
$i"
perf stat -r 10 -B $BIN --totally-silent -f -d /tmp/$i
sleep 3
sync
echo
done
########################################################################


And here are the results:

****test16-cl0.flac
Performance counter stats for '/tmp/flac-git-opt --totally-silent -f -d /tmp/test16-cl0.flac' (10 runs):
    620,669145      task-clock (msec)         #    0,999 CPUs utilized            ( +-  0,36% )
             4      context-switches          #    0,007 K/sec                    ( +- 12,77% )
             0      cpu-migrations            #    0,000 K/sec                  
           102      page-faults               #    0,164 K/sec                    ( +-  0,31% )
 1.659.587.593      cycles                    #    2,674 GHz                      ( +-  0,21% )
 3.544.392.275      instructions              #    2,14  insn per cycle           ( +-  0,00% )
   265.420.089      branches                  #  427,635 M/sec                    ( +-  0,01% )
     7.799.719      branch-misses             #    2,94% of all branches          ( +-  0,12% )

   0,620986301 seconds time elapsed                                          ( +-  0,37% )

****test16-cl5.flac
Performance counter stats for '/tmp/flac-git-opt --totally-silent -f -d /tmp/test16-cl5.flac' (10 runs):
    702,953210      task-clock (msec)         #    1,000 CPUs utilized            ( +-  0,28% )
             5      context-switches          #    0,007 K/sec                    ( +- 14,89% )
             0      cpu-migrations            #    0,000 K/sec                    ( +- 50,92% )
           120      page-faults               #    0,171 K/sec                    ( +-  0,33% )
 1.879.136.299      cycles                    #    2,673 GHz                      ( +-  0,15% )
 4.430.605.638      instructions              #    2,36  insn per cycle           ( +-  0,00% )
   264.470.255      branches                  #  376,227 M/sec                    ( +-  0,00% )
     7.447.174      branch-misses             #    2,82% of all branches          ( +-  0,07% )

   0,703254931 seconds time elapsed                                          ( +-  0,28% )

****test16-nocomp-cl0.flac
Performance counter stats for '/tmp/flac-git-opt --totally-silent -f -d /tmp/test16-nocomp-cl0.flac' (10 runs):
    993,153306      task-clock (msec)         #    1,000 CPUs utilized            ( +-  0,27% )
             4      context-switches          #    0,005 K/sec                    ( +- 12,06% )
             0      cpu-migrations            #    0,000 K/sec                    ( +- 55,28% )
           102      page-faults               #    0,103 K/sec                    ( +-  0,32% )
 2.658.086.321      cycles                    #    2,676 GHz                      ( +-  0,25% )
 7.457.070.868      instructions              #    2,81  insn per cycle           ( +-  0,00% )
   920.078.916      branches                  #  926,422 M/sec                    ( +-  0,00% )
     1.298.655      branch-misses             #    0,14% of all branches          ( +-  0,87% )

   0,993540048 seconds time elapsed                                          ( +-  0,27% )
Result summary:
CL0=0.620986301
CL5=0.703254931 +13.2%
CLN=0.993540048 +60%
Wow. That's a surprise.
+60% on the Non-Compressed flac. I'd expected it to be faster than CL0!?!?
I then learned from the flac designer that flac is still running several tasks of the  "decode" process...  

...now on a much larger No-Compression file. 

That for sure can make the difference. 

And that also means:
A Non-Compressed flac doesn't equal a .wav file from its data structure! 
An NC flac still needs to get processed!
And IMO that's pretty much killing the "Non-Compression" case...

Another conclusion: 

Using CL5 is 13.2% slower than CL0 on the decode.
If many of us appreciate a >5% performance increase by a new CRC algorithm,
choosing the right compression level will have a more then relevant impact on the overall decoding performance on top of that.


Of course a .wav beats them all. There's no compression and conversion.


Wrap-Up

What are the learnings of Part 1 and Part 2 of "flac on steriods" ?

If you look for best performance and highest efficiency you need to have a look at the binary AND the data. In my shown real world scenarios the performance gain adds up to more then 25% - for binary optimizations plus compression level optimization. Not too bad.

You can't rely on your distribution or SW package (e.g. LMS/sox) to provide you with highest performance SW. You'd need to compile it yourself for your own platform. 

This exercise also confirms to me that I'm on the right way with my own stuff by using CL0 flacs. 
And also doing the right thing by decoding flacs on my Intel NUC server instead of the RPI doesn't seem to be the worst idea.

And the exercise also shows that .wav from an efficiency perspective is still the preferable format. However. If you'd add the "streaming-load" effect to the equation
wav with more then double the size in comparison does add some extra load on that account. 

As usual. You need to look for the better compromise.

Since I decode my flacs prior to playback and bulk-store these in a local RAMbuffer I don't need wav, don't face the continuous streaming load issue and can still enjoy the advantages brought to us by flac... 

...and here it comes...  ...all that while not experiencing any impact on perceived sound quality - compared to the .wav reference ! ;)


Enjoy.


No comments:

Post a Comment