the audio streaming series

the custom engine

(Latest Update : Feb-13-2021)

Most of the project is done by now. Your system should be smoothly running as we speak. I hope you're already enjoying the result of your work. If you're still looking for more, there is more...  - the good news - ...there always will. After all these years I come to realize there got to be some truth in the  

Everything Matters

principle.

Let's give it another shot. The result of this exercise might not be what I'd call earth-shattering, though I'm positive you gonna tell me if it's been worth the effort.

What is this exercise all about ?

We gonna build our own engine. A custom engine. An engine that fits the chassis. What am I talking about? You gonna build your own custom squeezelite.  squeezelite - engine !?!? Yep. That's where the magics happens, where the fuel gets burned. The entire audio streaming OS, no, actually the entire upstream streaming environment is built and configured to provide for this tiny application. Until now we've been covering mostly environmental topics. It's time to have a little closer look under the hood. 

The tool (opensource) I'll provide puts you in the position to build your very own custom engine. And it won't require expert skills.

You'll be able to choose from two squeezelite package sources

three variants are being offered for both of them

  • minimal          - as slim as it gets
  • DSD & SRC
  • DSD & SRC Multi Processor - to speed up SRC  

The tool will auto-detect the platform - RPi3 or RPi4 (incl. CM). It'll autodetect its architecture 32-bit or 64-bit. And then will apply platform specific build parameters and optimizations.

The idea behind all this effort once more is to go after...

Maximum efficiency


Maximum efficiency !?!? Yep. I hear you.

Let's talk a bit about writing, developing and handling of software first.

Coding quality, which - beside other things - requires efficiently written code, is not the only thing that matters when looking into the efficiency of a program.  

There are several parameters that can have an impact on the efficiency while executing the final binary, such as

  • the platform - 32 or 64 bit
  • the compiler
  • the compiler version 
  • the compiling options 
  • application specific compiling features
  • specific CPU related optimizations
  • application features
  • and more

All this is gonna be addressed as part of this exercise.


To put some meat-on-the-bones I'd like to give some examples.

1.
It does matter if you run a 32-bit or a 64-bit platform. I am talking about the kernel AND the user-space
running 64-bit.
The compiler (gcc) uses complete different sets of default compiler options for 32 and 64-bit systems.
On a 64-bit system compiler optimizations are to a large extent enabled by default. Many applications will already benefit just by running on a 64-bit platform. Have a look at these benchmarks. This in no hocus pocus.

2.
The compiler is a key player. The compiler and compiler versions can make a difference. Compilers are living creatures. Compilers are continuously being evolved (usually improved). 
Debian based systems (RPiOS/Moode/Diet-Pi/Volumio) are usually tailing the OS crowd on the compiler front (currently gcc 8.x). That's not great.  Debian gets up to 2 years behind e.g. Gentoo, Arch Linux and Fedora ( and now pcP7.0)  based systems (currently (gcc 10.x). Up to 2 years I consider a lot!

3.
Now let's talk a bit about the binary configuration. While preparing for compiling a binary you usually can configure certain features to be considered or to be left out. These features can e.g. be enabling optional functionalities or optimizations (SSE/NEON asf) and functionalities like DSD,SRC,Infrared, GPIO support you name it. As an example. I've been testing and benchmarking the still not officially implemented ARM NEON support for flac and libflac. An optimization. While doing that little study I figured the NEON enabled flac was about 27,5%  faster than the flac version without it. From a SW designer perspective that's a lot. The original squeezelite was very well coded by Adrian (Triode). He put a lot of focus on highest performance and highest efficiency. The default pCP squeezelite binary comes with ALL optional features ON by default.  The  "lite" factor seems to be going down the drain.


All this is no hocus pocus. Compiling code for a specific CPU using special compiler options and using a stripped down set of features can and usually does improve efficiency. 


OK. Let's quickly introduce what to expect from

soundcheck's squeezelite version


I am maintaining my own squeezelite version. I keep it in sync with the changes being applied to the squeezelite master sources.

I added some small, to me valuable, features. I decided to share my own version with you. You might feel tempted to try it. I pretty much focus on Linux and RPI3+4 with that fork exclusive (for the time being).


What's different compared to the standard squeezelite version?

Main features:

1. Pro Audio Style Volume Control - internal

The background for introducing this feature I laid out in the article about volume controls.

The digital volume controls you usually find out there in the consumer world are not "linear" . 
Linear in terms of 1 click equaling a 1dB change on a 101 step scale. 
That's being introduced with this feature. A feature like that is a must in the pro-audio arena.

The internal pro-audio style volume control gets enabled by adding 


-Y 

to the Squeezelite Settings / Various Options field once the custom binary is up'n running.

2. Pro Audio Style Volume Control - external

Certain audio interfaces offer excellent on-dac volume controls. They use DAC chips that offer linear usually  0dBFS to -128dB ranges. Some offer it in 0.5dB stepwidths other in 1dB towards the audio driver and OS. In the end it's up to the audio device manufacturer  how to implement it. 
The first thing you need to figure out before using this feature is if your audio device manufacturer is supporting external volume control at all. And then you need to figure out how the mixer-control is called inside your OS.

As an example. The Allo Boss offers external volume control. The actual mixer control for this audio interface is named by Allo through the audio driver and is called Digital. Many other manufacturers calling the relevant volume control mixer  Master . 
You could figure it out by simply typing squeezelite -L or  amixer in a terminal if you're logged in via ssh.

To enable the external pro-audio volume control  feature you'd have to enable it under Squeezelite Settings / Various Options . 

To stick with above a DAC that offers external volume control like the Allo Boss you'd have to add 

-V Digital -X

To the options field.

Another nice side-effect - beside having a 1click=1dB scale - when running an external VC is that there's basically no change done to the audio stream before the stream hits the DAC HW. The internal volume control simply gets bypassed. 
Using such a feature would be mandatory if you'd like to play a DoP file and intend to use a software based volume control. The ESS Sabre ES9038 family would supports this feature.


Keep in mind though. The implementation quality of external controls can differ. As an example I'd like to mention the Khadas Toneboard. Even though it is offering external volume control via USB, accessible through the USB audio driver, it's not the excellent volume control that's being offered by the ESS Sabre 9038DAC chip itself. It's a simple MCU onboard implementation. Therefore DoP volume control doesn't work on the KTB. And that's why I am not using it. Towards my Toneboard I'm using my own linear internal volume control feature of squeezelite and skip DoP.


3. CPU Affinity Output Thread 



The CPU affinity setting option -A allows attaching the Alsa output thread to a dedicated single CPU. 
The idea is to leave the final process stage before the audio stream hits the audio device, running without any distractions on one CPU exclusive. I mentioned this method earlier in the series. 
With this feature inbuilt into squeezelite there's no need for external tools or settings to accomplish this functionality anymore


-A


Entering -A in the Squeezelite Settings / Various Options field would assign just the squeezelite output thread (Alsa only) to the last - the 4th - CPU. 

For achieving maximum efficiency of this feature the 4th CPU has to be isolated. An isolated CPU runs exclusive. There are not any kernel related actions like CPU load sharing mechanisms applied anymore.

This is IMO a key tweak! Don't miss it!!!

UPDATE Feb-01-2021: 
From version 1.4 of the "pCPt custom squeezelite builder" onwards all below CPU isolation and task affinity settings are done automatically by the provided program for the "soundcheck" custom binary variants. Old settings will be overwritten!
All this simply to make things easier for you and to make sure that everything 
is done properly. 


CPU isolation you configure under  Tweaks / pCP Kernel Tweaks / CPU Isolation  as pointed out in the base (OS setup) article. 

There's been quite change since Jan-31-2021. It turned out the the foundation kernel no longer supports CPU0 isolation. (I wrote an official ticket about the issue over at github.  response: "who needs that !?!?") 

That forced me to change strategy. I know many people have been following this CPU0 isolation.
If you still see it elsewhere. Forget it. It won't work anymore!


We now isolate CPU2 and 3. And leave CPU2 for the squeezelite main process and two further threads and use CPU3 for the squeezelite output thread.

Currently (Feb-01-2021) the "Squeezelite CPU" affinity is not working properly. pCP-Paul has been informed and confirmed to get it fixed soon.

 

IMPORTANT NOTE:

pCP itself also offers an option for this specific CPU affinity feature. Paul and me have been discussing this tweak some time ago. He then made it available via Web-UI. In the earlier OS setup article I said
to assign the Squeezelite Output CPU to the isolated CPU 3. If you stay with the standard (custom)  squeezelite binary this still applies. Keep it at 3. 

However. If you plan to apply -A for my new soundcheck-custom-squeezelite you MUST turn off the pCP feature. If both settings are enabled at the same time squeezelite won't start.


Now.

Scope


What we'll do during the build exercise, 

  • we'll use the OS provided compiler (gcc 10.x on pcP7.0) and
  • RPi model specific compiler options
  • we build a dynamically linked binary
  • without debugging options
  • and limit the embedded features to a minimum (no IR, no DSD, no resampling, no ffmpeg, etc.)
  • you can take it a notch further by making use of the soundcheck version




Build - The Engine


You should be able to conclude this exercise in not more then 15minutes btw. And you can uninstall it in less then a minute if you like.

Before you start the exercise make a backup of your pCP installation!
I also strongly recommend to install the latest pCP version preferably the 64-bit version. 

Part 1 - Preparation


1. Increase Filesystem (WEB-UI)

If not done yet, you'd need to increase the pretty tight default 64MB filesystem to at least 200MB first.
The custom build process requires temporarily ~100MB extra space.  

If you have increased your filesystem earlier, you can skip this step.


  1.  Go to the Main Page
  2.  Now enter Resize FS
  3.  Choose e.g. 200MB  from the pulldown and start the process by selecting Resize.

 The system will now reboot twice. The whole process takes about 2 - 3 minutes.




2. Update pCP system and applications (WEB-UI)


It's a good idea that we have the latest system updates, binaries and libraries installed before we continue.

On the pCP Main Page run 

1. a Full update over the squeezelite extensions. 


2. a Patch Update  for installing  pCP system updates



That might lead to a reboot once more. 

Now we've got a clean base.

PART 2 - Installation 

It's getting serious. 


1. SSH Login (terminal)


Login via ssh as you already know or as explained over here if you're new to it. 

User :        tc
Password: piCore


2. Installation


I completely changed the installation process (Feb/2021). I introduced soundcheck's tuning kit - pCP (sKit)  a toolset that lets you easily get everything installed and configured


Before you continue, install sKit and run the sKit-custom-squeezelite.sh tool.

Once that is done you can continue with below configuration.


Part 3 - Configuration (Web-UI)


This step only applies if you've installed one of the three soundcheck versions. For the standard binary you can leave this step alone.

You can now add your choice of (new) parameters which come with the "soundcheck" binary to the Squeezelite Settings/Various Options field inside the web browser. 

  • -X  = external pro-Audio 1dB/click volume control  
  • -Y  = internal pro-Audio 1dB/click volume control
  • -A  = assign output thread to last CPU



1st example: 
Below screenshot shows the squeezelite config for my Allo Boss test DAC with external pro-audio-style volume control enabled.  The control we have to use for the Allo Boss is called "Digital".
This can differ from DAC to DAC!
Just as a side-note/memory refresh. 
The -W simply allows to read samplerates from the PCM header in case you resample files on the LMS and stream them down as PCM. I have it in by default.


123435667778


2nd example: 
Below screenshot shows the squeezelite config with internal pro-audio-style volume control and the squeezelite output thread assigned to the 4th CPU. 







Save the new settings. 

And that'll be it. 

Summary

You now have a highly efficient binary in place.  Reading this article was probably the toughest part of the job. 

I hope you consider this exercise as part of the Audio Streaming project once more worth the effort. 
Hopefully your new engine gets you a smooth and silky ride. 😉


Enjoy!



15 comments:

  1. Hi, Soundcheck,
    I've been following your threads for a while now and I'm glad your site exists. Without your articles, all my equipment would only be half as good, if anything. Thanks for sharing your insights.
    So piCoreplayer has always been beyond reproach for me, but what comes out now with your help, wow. It was already crass after I taught my server how to upsample with your help, then your filter settings but now combined with your custom made binaries.....awesome. I admit, I had to listen to that and it's not earth-shattering, but for me it's a real audible improvement that I really enjoy. I am looking forward to part 2


    ReplyDelete
  2. Hi Souncheck,
    Your articles are great! And my sound has become very much better. Thanks!
    I was trying to make your own squeezelite. But I immediality ran into problems. As I do not speak Linux, I'm stuck. Any help is most welcome,
    Frank

    The error-messages are below the first commands:
    tc@piCorePlayer:~$ tce-load -wi \
    > compiletc \
    > wget \
    > libasound-dev \
    > pcp-libogg-dev \
    > pcp-libflac-dev \
    > pcp-libvorbis-dev \
    > pcp-libmad-dev \
    > pcp-libmpg123-dev \
    > pcp-libalac-dev \
    > pcp-libfaad2-dev \
    > pcp-libsoxr-dev
    rm: can't stat './compiletc.tcz.dep': Input/output error
    Downloading: compiletc.tcz
    rm: can't stat './wget.tcz.dep': Input/output error
    rm: can't stat './libasound-dev.tcz.dep': Input/output error
    Connecting to repo.picoreplayer.org (172.67.157.97:443)
    wget: error getting response: Connection reset by peer
    md5sum: compiletc.tcz.md5.txt: Input/output error
    Error on compiletc.tcz
    tc@piCorePlayer:~$

    ReplyDelete
    Replies
    1. I changed the command structure so that potential copy/paste issues won't occur.

      Please give it a try.

      Delete
    2. I just verified the process on a maiden 6.1 installation.


      ####
      tc@piCorePlayer:~$ tce-load -wi compiletc wget
      compiletc.tcz.dep OK
      gawk.tcz.dep OK
      mpfr.tcz.dep OK
      gcc.tcz.dep OK
      isl.tcz.dep OK
      mpc.tcz.dep OK
      gcc_libs-dev.tcz.dep OK
      ...
      ######


      Make sure you've got the filesystem increased to 300MB.
      Make sure you ran the "full update".
      If copy/paste still fails, try to enter the commands manually. Line by line.
      Because the terminal emulator might not handle the control codes properly on a copy/paste.

      Next try would be a fresh install.

      Delete
    3. I also tried latest putty from W10. No issues.

      As a last option all these additional packages can be installed using the pCP WEB UI.


      However. It seems that somehow the terminal considers the space a CR.
      That's why the shell tries to execute the commands. Something is wrong there.

      Delete
  3. Done! Works! Great sound!
    What I did was: bypassing the switch and directly wiring the rpi to the router, using another SSH-client, rebooting. Everything together I'm afraid so I don't know what did the trick...
    Many thanks for your help!

    ReplyDelete
  4. Thanks! Working great! First time Allo Usbridge Sig without cracking!

    ReplyDelete
  5. Hi Soundcheck,
    followed instructions, but after choosing a squeezelite update i get a error:
    verifing space requirements
    downloading extensions (~2-3 min)
    verifing extensions
    loading extensions
    downloading sources
    building
    program aborted
    ERROR: compiling binary

    I have an AlloBridge with CM3B+, no problems in compiling your "old" squeezelite on the 6.1.0 PcP.
    Can you advise?

    Regards,
    Remco

    ReplyDelete
    Replies
    1. Sorry for the issue.

      What OS version are you running?

      I tested RPi4 and RPi3, 32 and 64bit on pCP70.

      Can you get me the output of:

      cat /mnt/mmcblk0p2/tce/sl-custom-build.log

      Info:
      If the program breaks the logs should remain on the SD card.


      In case you want to clean up everything once the program has failed, just run it again and select the "remove custom installation".


      Delete
  6. Hi Soundcheck,
    I use the pcp 7.0 64bit.
    Just did a new image again. Error repeats.
    Output is this:
    tc@pCP:~$ cat /mnt/mmcblk0p2/tce/sl-custom-build.log
    make: Entering directory '/tmp/squeezelite'
    /tmp/Makefile.sc-rpi-ux-minimal:24: *** "Problems identifiying RPi 3 or 4!". St op.
    make: Leaving directory '/tmp/squeezelite'

    It might have to do with the RPI Compute Module 3B+ version i use on the AllBridge?

    ReplyDelete
    Replies
    1. Please contact me! It's CM related issue, which I can't test.

      Delete
  7. Additional something i noticed.
    If i follow all your instructions in the order presented in your blog i get an error if i set the cpu isolation settings:
    "taskset: invalid option -- 'c' BusyBox v1.31.1 (2020-12-18 22:25:41 EST) multi-call binary. Usage: taskset [-p] [HEXMASK] PID | PROG ARGS Set or get CPU affinity -p Operate on an existing PID"

    Also giving the -A in the "Various options" field causes Squeezelite to not start.

    Perhaps this helps.

    Regards,
    Remco

    ReplyDelete
    Replies
    1. Both functions can't be turned on at the same time. I made a note in the article.
      Thx. Good catch.

      Delete
  8. Hi Klaus,
    It works with the AlloBridge Signature now! Thnx!
    Regards,
    Remco

    ReplyDelete