Saturday, January 20, 2018

RaspBerry Pi - The Audio Engine - Part 5 - customized squeezelite

This is NOT the first post of the Audio Engine  series. I'd recommend to start the journey
@ Part1 .


Most of the project is done by now. If you're still looking for more, there is more. ;)

The result of this exercise might not be that earth shattering as you'd expect it to be.
It's IMO still worth it.

Before you start this exercise, I'd recommend to get used to the performance of the pCP setup you ended up with after Part4 for a couple of days. Just let it sink in.
And then you should go ahead with this measure. 


What are we gonna do now !?!?

Let's have a look at the actual pCP audio engine - squeezelite. To be exact the squeezelite binary. (Windows folks are rather used to the term executable).


What we can do quite easily is making the squeezelite binary running a bit more efficient. 



More efficient !?!? Yep. I hear you.

Beside coding quality and efficient source code, there are different items having an impact 

on the efficiency of the resulting binary.

1. the compiler (even the compiler version matters)
2. the compiling options (generic architecture vs. HW=CPU/GPU specific)
3. static vs dynamic linking of used libraries
4. features and size
5. debugging options


The original squeezelite was very well written by Triode. Performance and efficiency has been a huge focus of him. squeezelite was and still is a great piece of code.

During this exercise we'll focus on  2. and 4. of above list. 

A little background.

You can compile a binary with rather generic compiler options. These binaries
would run on several boards belonging to the same architecture. Like ARMv7.
That's the way it's done by pretty much all RPI OSes. 
The resulting binary would be that generic that it runs on all ARMv7 computers.
That comes at a price. You can't make use of valuable CPU or GPU specific features. 
That's what we gonna change.

As an example, I measured 10-20% performance difference just by changing compiler options on the widely used linpack benchmarking tool. Basically the benchmark tool itself were depending on the compiler options for that specific hardware and gcc version.
A similar behavior I experienced when running benchmarks on customized flac and sox binaries.

What does that mean? If you look at any software benchmarking results and you don't know the compiler options being used and you also don't know the gcc (compiler) version being used the benchmark results are all but reliable - they are actually pretty useless!

Let's get the job done.


1. We'll use the default gcc version.  gcc  on pCP is currently Nov-2018 @ 7.1.0 

    Unfortunately it's outdated. 8.2 is the current gcc version.
2. We add RPI "model specific" compiler options
3. The PcP folks already went not long ago from static to dynamic linking.
4. We skip the resampling and DSD/DOP capabilties (for now)
5. Debugging can be controlled via userspace command line options.


OK. Let's get started.



You should be able to conclude this exercise in not more then half an hour btw.

Increase Filesystem

Since we download some libraries and the sources plus the compiling environment, we need to increase the pretty tight default 50MB filesystem to 200MB first - we still have a little less than 800MB of RAM-disk space (/tmp) to play with afterwards.

Go to the Main Page and select the "Advanced" or "Beta"mode. Then select "resize FS"







Now set the 200MB size and start the process. The system reboots twice. It'll take two to three minutes.






Once that is done we login via ssh. Meanwhile you know how to do it.



1. First we install the required packages:
###############################

tce-load -wi compiletc wget libasound-dev flac-dev libvorbis-dev libmad-dev 
tce-load -wi mpg123-dev faad2-dev


###############################

2. Download and and unzip the squeezelite sources
###############################

cd /tmp
wget https://github.com/ralph-irving/squeezelite/archive/master.zip

unzip master.zip

cd squeezelite-master



###############################

3. Compile the sources 


#####PI3 + 3B+: ###############################

Last update (Nov-15-2018):


export CFLAGS="-O3 -mcpu=cortex-a53 -mtune=cortex-a53 -mfpu=neon-fp-armv8 -mneon-for-64bits -mfloat-abi=hard"

make clean
make


Below is the old set of flags - just as reference:
export CFLAGS="-O3 -mcpu=cortex-a53 -mfpu=neon-fp-armv8 -fno-delayed-branch -fno-selective-scheduling2 -fno-whole-program -mfloat-abi=hard -fno-fast-math"

###############################

For those who run a PI2 you might want to try a PI2 specific compiler flags instead of above:

###############################

export CFLAGS="-O3 -mcpu=cortex-a7 -mfpu=neon-vfpv4  -mfloat-abi=hard -funsafe-math-optimizations"
make clean
make

################################

4. Install the binary

This step applies to pCP >= 3.5 !

###############################


sudo su


cp -f squeezelite /mnt/mmcblk0p2/tce/squeezelite-custom


chmod 777 /mnt/mmcblk0p2/tce/squeezelite-custom

sync
reboot

###############################


As sson as system is up'n running again, you have to enable the new custom binary in the browser under "Squeezelite Settings".





Done.


Hint: 

You now could take your new squeezelite-custom binary from that 200MB image 

and move it to your already backed-up lean 50MB image to the same spot as described above! "/mnt/mmcblk0p2/tce" .This way you'd be able to run the 50MB image with the new binary and without all the extra packages and larger partition.



That'll be it.

As usual. Make a backup if everything is up'n running fine.



Once more. You'll be lacking the resampling, dsd, infrared, gpio features when using above slim binary!   I'd guess most of you'll get along with it. 


Enjoy your new speedy binary. Please, let me know if it makes a difference.





4 comments:

  1. Hi Klaus,

    What can I do if I am unable to resize the file system?

    Perhaps because I expanded the SD card a few months ago it tells me "WARNING: Not enough space available for expansion (only 4 MB) (Choose a size between the current partition size: 7707 MB and the maximum partition size: 7711 MB"

    Any thoughts?

    Thanks

    Orlando

    ReplyDelete
    Replies
    1. Just go ahead. You maxed-out your partition already.
      However.
      You havn't been following the series from 1-5.
      Better try that!

      Delete
  2. Solution to my previous post: I used "sudo make clean" and "sudo make" and it worked.

    ReplyDelete
  3. Hi Klaus,
    I have implemented your recommendations for 3.5 version on 3 RPI boards (for my friends) + PS + network. Huge improvement in sound quality. I am struggling to compile pcp 4.0 on RPI 3 B+. Are compiler flags the same? Thank you.

    ReplyDelete