(Latest Update : Feb-13-2021)
Most of the project is done by now. Your system should be smoothly running as we speak. I hope
you're already enjoying the result of your work. If you're
still looking for more, there is more... - the good news - ...there always will. After all these years I come to realize there got to be some truth in the
Everything Matters
principle.
Let's give it another shot. The result of this exercise might not be what I'd call earth-shattering, though I'm positive you gonna tell me if it's been worth the effort.
What is this exercise all about ?
We gonna build our own engine. A custom engine. An engine that fits the
chassis. What am I talking about? You gonna build your own custom squeezelite. squeezelite - engine !?!? Yep. That's where the magics
happens, where the fuel gets burned. The entire audio
streaming OS, no, actually the entire upstream streaming environment is built
and configured to provide for this tiny application. Until now we've been covering mostly environmental topics. It's time to have a little closer look under the hood.
The tool (opensource) I'll provide puts you in the position to build your very own custom engine. And it won't require expert skills.
You'll be able to choose from two squeezelite package sources
three variants are being offered for both of them
- minimal - as slim as it gets
- DSD & SRC
- DSD & SRC Multi Processor - to speed up SRC
The tool will auto-detect the platform - RPi3 or RPi4 (incl. CM). It'll autodetect its architecture 32-bit or 64-bit. And then will apply platform specific build parameters and optimizations.
The idea behind all this effort once more is to go after...
Maximum efficiency
Maximum efficiency !?!? Yep. I hear you.
Let's talk a bit about writing, developing and handling of software
first.
Coding quality, which - beside other things - requires efficiently
written code, is not the only thing that matters when looking into the
efficiency of a program.
There are several parameters that can have an impact on the efficiency
while executing the final binary, such as
- the platform - 32 or 64 bit
- the compiler
- the compiler version
- the compiling options
- application specific compiling features
- specific CPU related optimizations
- application features
- and more
All this is gonna be addressed as part of this exercise.
To put some meat-on-the-bones I'd like to give some examples.
1.
It does matter if you run a 32-bit or a 64-bit platform. I am talking about the
kernel AND the user-space
running 64-bit.
The compiler (gcc) uses complete different sets of default compiler
options for 32 and 64-bit systems.
On a 64-bit system compiler optimizations are to a large extent enabled by default. Many
applications will already benefit just by running on a 64-bit platform. Have a look at these
benchmarks. This in no hocus pocus.
2.
The compiler is a key player. The compiler and compiler versions can make
a difference. Compilers are living creatures. Compilers are continuously
being evolved (usually improved).
Debian based systems (RPiOS/Moode/Diet-Pi/Volumio) are usually tailing the
OS crowd on the compiler front (currently gcc 8.x). That's not great. Debian gets up to 2
years behind e.g. Gentoo, Arch Linux and Fedora ( and now pcP7.0) based
systems (currently (gcc 10.x). Up to 2 years I consider
a lot!
3.
Now let's talk a bit about the binary configuration. While preparing for compiling a binary you usually can configure certain features to be considered or to be left out. These features can e.g. be enabling optional functionalities or optimizations (SSE/NEON asf) and functionalities like DSD,SRC,Infrared, GPIO support you name it. As an example. I've been
testing and benchmarking the still not officially implemented ARM NEON support for flac and
libflac. An optimization. While doing that little study I figured the NEON enabled flac was
about 27,5% faster than the flac version without it. From a SW
designer perspective that's a lot. The original squeezelite was very well coded by Adrian (Triode). He put
a lot of focus on highest performance and highest efficiency. The default pCP squeezelite binary comes with ALL optional features ON by default. The "lite" factor seems to be going down the drain.
All
this is no hocus pocus. Compiling code for a specific CPU using special
compiler options and using a stripped down set of features can and usually
does improve efficiency.
OK. Let's quickly introduce what to expect from
soundcheck's squeezelite version
I am maintaining my
own squeezelite version. I keep it in sync with the changes being applied to the squeezelite
master sources.
I added some small, to me valuable, features. I decided to share my own
version with you. You might feel tempted to try it. I pretty much focus
on Linux and RPI3+4 with that fork exclusive (for the time being).
What's different compared to the standard squeezelite version?
Main
features:
1. Pro Audio Style Volume Control - internal
The digital volume controls you usually find out there in the consumer
world are not "linear" .
Linear in terms of 1 click equaling a 1dB change on a 101 step
scale.
That's being introduced with this feature. A feature like that is a must
in the pro-audio arena.
The internal pro-audio style volume control gets enabled by adding
to the Squeezelite Settings / Various Options field once the custom binary is up'n running.
2. Pro Audio Style Volume Control - external
Certain audio interfaces offer excellent on-dac volume controls. They
use DAC chips that offer linear usually 0dBFS to -128dB ranges.
Some offer it in 0.5dB stepwidths other in 1dB towards the audio driver
and OS. In the end it's up to the audio device manufacturer how to
implement it.
The first thing you need to figure out before using this feature is if
your audio device manufacturer is supporting external volume control at
all. And then you need to figure out how the mixer-control is called
inside your OS.
As an example. The Allo Boss offers external volume
control. The actual mixer control for this audio interface is named by
Allo through the audio driver and is called Digital. Many
other manufacturers calling the relevant volume control
mixer Master .
You could figure it out by simply typing squeezelite -L or amixer in a terminal if you're logged
in via ssh.
To enable the external pro-audio volume control feature you'd have
to enable it under Squeezelite Settings / Various Options .
To stick with above a DAC that offers external volume control like the
Allo Boss you'd have to add
To the options field.
Another nice side-effect - beside having a 1click=1dB scale - when running
an external VC is that there's basically no change done to the audio stream
before the stream hits the DAC HW. The internal volume control simply gets
bypassed.
Using such a feature would be mandatory if you'd like to play a DoP file and
intend to use a software based volume control. The ESS Sabre ES9038 family
would supports this feature.
Keep in mind though. The implementation quality of external controls can
differ. As an example I'd like to mention the Khadas Toneboard. Even though
it is offering external volume control via USB, accessible through the USB
audio driver, it's not the excellent volume control that's being offered by
the ESS Sabre 9038DAC chip itself. It's a simple MCU onboard implementation.
Therefore DoP volume control doesn't work on the KTB. And that's why I am
not using it. Towards my Toneboard I'm using my own linear internal volume
control feature of squeezelite and skip DoP.
3. CPU Affinity Output Thread
The CPU affinity setting option -A allows attaching the Alsa output thread
to a dedicated single CPU.
The idea is to leave the final process stage before the audio stream hits
the audio device, running without any distractions on one CPU exclusive. I
mentioned this method earlier in the series.
With this feature inbuilt into squeezelite there's no need for external
tools or settings to accomplish this functionality anymore
Entering -A in the Squeezelite Settings / Various Options field would assign just the
squeezelite output thread (Alsa only) to the last - the 4th -
CPU.
For achieving maximum efficiency of this feature the 4th CPU has to be
isolated. An isolated CPU runs exclusive. There are not any kernel
related actions like CPU load sharing mechanisms applied anymore.
This is IMO a key tweak! Don't miss it!!!
UPDATE Feb-01-2021:
From version 1.4 of the "pCPt custom squeezelite builder" onwards all below CPU isolation and task affinity settings are done automatically by the provided program for the "soundcheck" custom binary variants. Old settings will be overwritten!
All this simply to make things easier for you and to make sure that everything is done properly.
CPU isolation you configure under
Tweaks /
pCP Kernel Tweaks / CPU Isolation as pointed out in
the base (OS setup) article.
There's been quite change since Jan-31-2021. It turned out the the foundation kernel no longer supports CPU0 isolation. (I wrote an official ticket about the issue over at github. response: "who needs that !?!?")
That forced me to change strategy. I know many people have been following this CPU0 isolation.
If you still see it elsewhere. Forget it. It won't work anymore!
We now isolate CPU2 and 3. And leave CPU2 for the squeezelite main process and two further threads and use CPU3 for the squeezelite output thread.
Currently (Feb-01-2021) the "Squeezelite CPU" affinity is not working properly. pCP-Paul has been informed and confirmed to get it fixed soon.
IMPORTANT NOTE:
pCP itself also offers an option for this specific CPU affinity
feature. Paul and me have been discussing this tweak some time ago.
He then made it available via Web-UI. In the earlier OS setup article
I said
to assign the Squeezelite Output CPU to the isolated CPU 3. If you
stay with the standard (custom) squeezelite binary this still
applies. Keep it at 3.
However. If you plan to apply -A for my new
soundcheck-custom-squeezelite you MUST turn off the pCP
feature. If both settings
are enabled at the same time squeezelite won't start.
Now.
Scope
What we'll do during the build exercise,
- we'll use the OS provided compiler (gcc 10.x on pcP7.0) and
- RPi model specific compiler options
- we build a dynamically linked binary
- without debugging options
- and limit the embedded features to a minimum (no IR, no DSD, no resampling, no ffmpeg, etc.)
- you can take it a notch further by making use of the soundcheck version
Build - The Engine
You should be able to conclude this exercise in not more then 15minutes btw. And you can uninstall it in less then a minute if you like.
Before you start the exercise make a backup of your pCP
installation!
I also strongly recommend to install the
latest pCP version preferably the 64-bit version.
Part 1 - Preparation
1. Increase Filesystem (WEB-UI)
If not done yet, you'd need to increase the pretty tight default 64MB filesystem to at least
200MB first.
The custom build process requires temporarily ~100MB extra
space.
If you have increased your filesystem earlier, you can skip this step.
- Go to the Main Page
- Now enter Resize FS
-
Choose e.g. 200MB from the pulldown and start the
process by selecting Resize.
The system will now reboot twice. The whole process takes about 2 - 3
minutes.
2. Update pCP system and applications (WEB-UI)
It's a good idea that we have the latest system updates, binaries and libraries
installed before we continue.
On the pCP Main Page run
1. a Full update over the squeezelite extensions.
2. a
Patch Update for installing pCP system updates
That might lead to a reboot once more.
Now we've got a clean base.
PART 2 - Installation
It's getting serious.
1. SSH Login (terminal)
Login via ssh as you already know or as explained over
here if you're new to it.
User : tc
Password: piCore
2. Installation
I completely changed the installation process (Feb/2021). I introduced soundcheck's tuning kit - pCP (sKit) a toolset that lets you easily get everything installed and configured
Before you continue,
install sKit and run the
sKit-custom-squeezelite.sh tool.
Once that is done you can continue with below configuration.
Part 3 - Configuration (Web-UI)
This step only applies if you've installed one of the three soundcheck versions. For the standard
binary you can leave this step alone.
You can now add your choice of (new) parameters which come with the
"soundcheck" binary to the
Squeezelite Settings/Various Options field inside the web
browser.
-
-X = external pro-Audio 1dB/click volume control
- -Y = internal pro-Audio 1dB/click volume control
- -A = assign output thread to last CPU
1st example:
Below screenshot shows the squeezelite config for my Allo Boss test DAC
with external pro-audio-style volume control enabled. The
control we have to use for the Allo Boss is called "Digital".
This can differ from DAC to DAC!
Just as a side-note/memory refresh.
The -W simply allows to read samplerates from the PCM header in case you
resample files on the LMS and stream them down as PCM. I have it in by
default.
2nd example:
Below screenshot shows the squeezelite config with
internal pro-audio-style volume control and the squeezelite
output thread assigned to the 4th CPU.
Save the new settings.
And that'll be it.
Summary
You now have a highly efficient binary in place. Reading this article was probably the toughest part of the job.
I hope you consider this exercise as part of the Audio Streaming project once more worth the effort.
Hopefully your new engine gets you a smooth and silky ride. 😉
Enjoy!
Hi, Soundcheck,
ReplyDeleteI've been following your threads for a while now and I'm glad your site exists. Without your articles, all my equipment would only be half as good, if anything. Thanks for sharing your insights.
So piCoreplayer has always been beyond reproach for me, but what comes out now with your help, wow. It was already crass after I taught my server how to upsample with your help, then your filter settings but now combined with your custom made binaries.....awesome. I admit, I had to listen to that and it's not earth-shattering, but for me it's a real audible improvement that I really enjoy. I am looking forward to part 2
Hi Souncheck,
ReplyDeleteYour articles are great! And my sound has become very much better. Thanks!
I was trying to make your own squeezelite. But I immediality ran into problems. As I do not speak Linux, I'm stuck. Any help is most welcome,
Frank
The error-messages are below the first commands:
tc@piCorePlayer:~$ tce-load -wi \
> compiletc \
> wget \
> libasound-dev \
> pcp-libogg-dev \
> pcp-libflac-dev \
> pcp-libvorbis-dev \
> pcp-libmad-dev \
> pcp-libmpg123-dev \
> pcp-libalac-dev \
> pcp-libfaad2-dev \
> pcp-libsoxr-dev
rm: can't stat './compiletc.tcz.dep': Input/output error
Downloading: compiletc.tcz
rm: can't stat './wget.tcz.dep': Input/output error
rm: can't stat './libasound-dev.tcz.dep': Input/output error
Connecting to repo.picoreplayer.org (172.67.157.97:443)
wget: error getting response: Connection reset by peer
md5sum: compiletc.tcz.md5.txt: Input/output error
Error on compiletc.tcz
tc@piCorePlayer:~$
I changed the command structure so that potential copy/paste issues won't occur.
DeletePlease give it a try.
I just verified the process on a maiden 6.1 installation.
Delete####
tc@piCorePlayer:~$ tce-load -wi compiletc wget
compiletc.tcz.dep OK
gawk.tcz.dep OK
mpfr.tcz.dep OK
gcc.tcz.dep OK
isl.tcz.dep OK
mpc.tcz.dep OK
gcc_libs-dev.tcz.dep OK
...
######
Make sure you've got the filesystem increased to 300MB.
Make sure you ran the "full update".
If copy/paste still fails, try to enter the commands manually. Line by line.
Because the terminal emulator might not handle the control codes properly on a copy/paste.
Next try would be a fresh install.
I also tried latest putty from W10. No issues.
DeleteAs a last option all these additional packages can be installed using the pCP WEB UI.
However. It seems that somehow the terminal considers the space a CR.
That's why the shell tries to execute the commands. Something is wrong there.
Done! Works! Great sound!
ReplyDeleteWhat I did was: bypassing the switch and directly wiring the rpi to the router, using another SSH-client, rebooting. Everything together I'm afraid so I don't know what did the trick...
Many thanks for your help!
Great.
DeleteThx for the feedback.
Thanks! Working great! First time Allo Usbridge Sig without cracking!
ReplyDeleteHi Soundcheck,
ReplyDeletefollowed instructions, but after choosing a squeezelite update i get a error:
verifing space requirements
downloading extensions (~2-3 min)
verifing extensions
loading extensions
downloading sources
building
program aborted
ERROR: compiling binary
I have an AlloBridge with CM3B+, no problems in compiling your "old" squeezelite on the 6.1.0 PcP.
Can you advise?
Regards,
Remco
Sorry for the issue.
DeleteWhat OS version are you running?
I tested RPi4 and RPi3, 32 and 64bit on pCP70.
Can you get me the output of:
cat /mnt/mmcblk0p2/tce/sl-custom-build.log
Info:
If the program breaks the logs should remain on the SD card.
In case you want to clean up everything once the program has failed, just run it again and select the "remove custom installation".
Hi Soundcheck,
ReplyDeleteI use the pcp 7.0 64bit.
Just did a new image again. Error repeats.
Output is this:
tc@pCP:~$ cat /mnt/mmcblk0p2/tce/sl-custom-build.log
make: Entering directory '/tmp/squeezelite'
/tmp/Makefile.sc-rpi-ux-minimal:24: *** "Problems identifiying RPi 3 or 4!". St op.
make: Leaving directory '/tmp/squeezelite'
It might have to do with the RPI Compute Module 3B+ version i use on the AllBridge?
Please contact me! It's CM related issue, which I can't test.
DeleteAdditional something i noticed.
ReplyDeleteIf i follow all your instructions in the order presented in your blog i get an error if i set the cpu isolation settings:
"taskset: invalid option -- 'c' BusyBox v1.31.1 (2020-12-18 22:25:41 EST) multi-call binary. Usage: taskset [-p] [HEXMASK] PID | PROG ARGS Set or get CPU affinity -p Operate on an existing PID"
Also giving the -A in the "Various options" field causes Squeezelite to not start.
Perhaps this helps.
Regards,
Remco
Both functions can't be turned on at the same time. I made a note in the article.
DeleteThx. Good catch.
Hi Klaus,
ReplyDeleteIt works with the AlloBridge Signature now! Thnx!
Regards,
Remco