Saturday, May 14, 2011

CD Extraction

With this article I'd like to tackle an issue, which always makes me feel odd
when thinking about it.

"CD Extraction"...

... do I really have it 100% under control ...

... after hundreds of rips and years of active involvement in Computer based audio !?!?!

I just read two articles in the latest issue of the german audio magazine  "Stereo". Stereo is one of the biggest, if not the biggest of its kind over here in Germany.
I do think they got quite a good reputation in the market. 

This month (05/11) Stereo stuck their heads into Computer CD-drives and extraction software to compare the results generated by the tools and drives.

The real interesting thing about it was the result of it.
The soundquality ranking of extraction results on different drives and different software in particular - got my full attention.
As a matter of fact according to Stereo, drives make a difference and tools make a difference on soundquality too - and they are not talking about subtle differences if you look at the SQ-ranking. There are drives going as low as 88% (SonyOptiArc DRX-S 77 U) and extractions tools as low as 94% (EAC!!!!) on the SQ ranking.

Guess what. EAC - was the worst of all.

Do I have a reason to question that Stereo article, assuming somewhat professional test-cases and setups (using Accurate Rip etc.) !?!?!


I'm considering myself quite open and tolerant if somebody reports differences on soundquality on things, which shouldn't be there in theory.
Have a look at my SB Touch Toolbox. It wouldn't exist if I'd followed all those "theories".
A theory works if you're aware of all facts and you'll consider all of them.
Or if you're "able" to explain exactly what's your theory is based on.
The more complex things get, the more difficult it gets to consider all aspects of course.
If you don't know all aspects, you'll have a hard time to explain them. Nor you'll be able to consider them.

Of course we all know that audio reviews in particular can't do more then giving a rough direction. Especially soundquality rankings are usually pretty subjective.

However - I honestly do have my doubts to hear any differences on presumably 100% identical files in a carefully chosen test environment. (Let see. Perhaps they'll invite me for a session. ;) )

What to do now? Torture Google, checkout the community at Audio Asylum,
get in Touch with Stereo, do a little study by myself!?!?!
Guess what - I did all that.

Google didn't really come up with satisfactory results. As usual lots of fragmented stuff.  At Audio Asylum I had to face the usual lectures and  Ulrich Wienforth - the guy in charge at Stereo responded (via mail conversation) to my "expressed" doubts by trying to defend the Stereo position. That was expected. And is IMO fair enough. (I do not see the usual commercial or marketing implication that you find very often behind typical reviews in this case. Because at least the extraction tools they tested were all free of charge.) For those who read those articles: What he didn't mention in the article was the listening test done via streaming client. That
fact he did mention via mail! ( If you know the article - I consider this an important information)

Just to make it clear. I do not question that Stereo experienced differences.
I do question their test cases and test environment first of all.

I knew I had to do more than just arguing. I wanted to make 100% sure
and show that all extracted files "can" be 100% identical.
I had to show them and myself that I'm right about that very minimum requirement.

To conduct my little study I intended to come up with a 100% waterproof comparison. My idea was to extract  the same track, with multiple tools
and drives and run a waterproof sha-2 256bit checksum over the entire file.
Those checksums provided by the extraction tools are IMO useless for comparisons purposes. You can't use them if you want to conduct a 1:1
comparison over different tools, since they all use different algorithms.

By doing the comparison the full-file way I cover PCM data, headers, 0-bytes and meta-data altogether. IMO the only way to compare apples and apples. You should know that just looking at e.g. the filesize wouldn't be sufficient.
If you change e.g. the drive-offset the file size remains the same, but the content will differ!

I wanted to make 100% sure that a difference in soundquality can not be related to the most obvious issue -- a different file content first of all.

There might be other issues like disk fragmentation causing a different load thus different jitter during playback of identical files or similar. But let's keep those speculations for a later stage. There are easy ways to cope with this issue.
First I'd like to prove that all files can be extracted 100% identical,
if we leave out things like e.g. very messy drives or e.g. scratched CDs.

CD Extraction tools - Analysis - Summary

I tested three different subjects:

1. Extraction Tool Comparison
2. Drive Comparison
3. Flac en-decoding

My choice of tools  were cdparanoia (Linux), iTunes, dbPoweramp, EAC and foobar. ( pretty much in line with the Stereo review - they didn't have dbP on the list because it is non-free - and I put it in because to me it the reference app)

The target format is the riff-wav format at16 bits and 44.1khz.

I ran test-cases with and without drive-offset on a Plexwriter Premium 1
drive ( still a reference drive)  and a standard  Toshiba DVD 5372V.

I ran 2 different checksum tests on the extracted files:

1. internal PCM sha-1 checksum - with shntool
2. sha-2 checksum over the entire file - with sha256sum

for rips with and without drive offset corrected.


cdp,eac,dbp and fob deliver 100% identical results on all tests.
There are not any differences to identify on the file, which actually makes the PCM test obsolete.

Meta-data settings must be disabled within EAC and dbp to avoid
a meta-tag footer (yes - within a .wav file ) of the corresponding .wav file.
This I figured out because of running the PCM checksum test-case!!!
The PCM checksums were identical on those files, the file checksums were not.

iTunes delivers identical results compared to the other tools on test-cases where NO drive-offset (non-compliant Accurate Rip mode) is configured.
Drive-offset corrections with iTunes are not possible!!!

The different drives I tested deliver identical results if the Accurate Rip drive-offset is used. Without AR drive-offset configured the drives do not deliver the
same PCM data!!! That's why iTunes will never deliver identical data from different drives.

JOOC I added a flac en- and decoding test-case to my little study.
I wanted to verify if the sha checksum of the original .wav file remains identical after flac en- and decoding. As a part of the test I added tags to the flac for the encoding. I also used different compression levels for the test-cases.
I can confirm that - as expected - the pre-/post-checksums are identical.


From my perspective the results look great.

The tools "can" deliver 100% identical results. The key challenge is
to configure them correctly to do so.

It required some digging into the setup menus of EAC and dbPoweramp
to get there.

When it comes to the drives. Extraction results would all differ, unless you'd
use the drive-offset as outlined by Accurate Rip.

There seems to be one question mark behind the Accurate Rip drive-offset database though. It's been discussed on the NET, with the EAC designer Andre Wiethoff that the drive-offset as specified by Accurate Rip might not be correct. 
The number should actually be 30 samples lower then the AR reported number.
That would mean that the entire Accurate Rip database would not be
correct. And afaik it hasn't been corrected since then.
When introducing their (dbPoweramp=Accurate Rip)  better AR checksum recently, AR IMO should have corrected also this potential offset problem too.

Though working with equal results for different drives would probably still be better than working without drive-offset adjustments at all.
Those todays "reference-data" would just be wrong in a very consistent way.
And if a consistently wrong drive-offset wouldn't harm the extraction result and SQ experience it would not be an issue at all to work with such a wrong offset.

The other problem is that Accurate Rip leaves out the first 5 frames (you'll also read about 2940 samples) of the first track and the last 5 frames of the last track. That's done on purpose. These are the critical usually inconsistent areas, when using different drives. They'd never manage to build a reliable reference if they'd include these data.

I do think Accurate Rip should fix at least the offset issue, if the "-30 sample" issue would be verified and confirmed.

It's not nice to have several thousands of samples out of equation either.
But I guess that's the price to pay for a standardized one-size-fits all approach.

The next thing to figure out is if the drive offset problem is really causing differences on soundquality. Folks, you're invited to do join the club of testers.
I'd guess that many of you guys reading this run highly resolving systems.
Note: If there'd be differences between those files, you wouldn't figure that out on a standard audio system.

Just subtract 30 samples from your AR drive-offset figure and do the rip again. (Please let me know if you experience any difference)

iTunes is not working with any drive-offset option right now.
If that's any better - I doubt it. I for sure wouldn't rip any serious data with iTunes. As mentioned before you'll get different results on different drives and you won't end up with any reference quality rip.
That applies to all other drives as well of course, if running the rips without offset correction.

Advise: Before you start ripping your CDs make sure that different drives and tools generate the same results. Just one wrong setting anywhere might change your result.

During my little study I run a sha256sum sha-2 ( state-of -the art)  checksum on each file.
That's more reliable then any checksum offered by any extraction tool or Accurate Rip and lets me compare the files.

Bottom line.

The tested drives and tools deliver 100% identical results if the configuration is done right. From that perspective there shouldn't be any difference on SQ.

If you'd don't use Accurate Rip drive-offsets, every rip will be different on every drive. And that might be an issue.

When it comes to the extraction tools. They pretty much all deliver exactly the
same data - if the setup is correct.

From my perspective this conclusion should be sufficient to provide a solid base for any further investigations.

Finally. I did it. I sat down and listened to all of the test files that I generated. 
Honestly. I got a hard time to identify any difference between the files.
Perhaps my system is not good enough. Or my hearing capabilties are just not sufficiant. I'd love to sit down with those Stereo guys to run that test at their site.

I hope you find that article somewhat interesting. As always - feedback  is more than welcome.

In below appendixes you find the test results of my test cases and the test environment.


Appendix 1:
soundcheck's checksum test Rev 1 --- Sun May 15 11:15:52 CEST 2011


:::Checksum SHA-1 internal PCM - Drive 1: with drive offset::::::::::::::::::::::::::::::::::::
8fc5a1f4332d6b20c9dcfbdb74220ada9b84ee0c  /track01_cdp_d1_o030.wav
8fc5a1f4332d6b20c9dcfbdb74220ada9b84ee0c  /track01_dbp_d1_o030.wav
8fc5a1f4332d6b20c9dcfbdb74220ada9b84ee0c  /track01_eac_d1_o030.wav
8fc5a1f4332d6b20c9dcfbdb74220ada9b84ee0c  /track01_fob_d1_o030.wav

:::Checksum SHA-1 internal PCM - Drive 1: without drive offset:::::::::::::::::::::::::::::::::
a82ff6f76c7f336db922c9d210b2d0d6a7cbead3  /track01_cdp_d1_o000.wav
a82ff6f76c7f336db922c9d210b2d0d6a7cbead3  /track01_dbp_d1_o000.wav
a82ff6f76c7f336db922c9d210b2d0d6a7cbead3  /track01_eac_d1_o000.wav
a82ff6f76c7f336db922c9d210b2d0d6a7cbead3  /track01_fob_d1_o000.wav
a82ff6f76c7f336db922c9d210b2d0d6a7cbead3  /track01_itu_d1_o000.wav

:::Checksum SHA-2(256) entire file - Drive 1: with drive offset::::::::::::::::::::::::::::::::
49eacbf1192931289158fbdd72ec56353d65aaa4086fa432eb161290049909e6  /track01_cdp_d1_o030.wav
49eacbf1192931289158fbdd72ec56353d65aaa4086fa432eb161290049909e6  /track01_dbp_d1_o030.wav
49eacbf1192931289158fbdd72ec56353d65aaa4086fa432eb161290049909e6  /track01_eac_d1_o030.wav
49eacbf1192931289158fbdd72ec56353d65aaa4086fa432eb161290049909e6  /track01_fob_d1_o030.wav

:::Checksum SHA-2(256) entire file - Drive 1: without drive offset:::::::::::::::::::::::::::::
43be2e2ecd260699165a874efaf7bcfc88c3e0a3c02f0b32fc821ce58786addf  /track01_cdp_d1_o000.wav
43be2e2ecd260699165a874efaf7bcfc88c3e0a3c02f0b32fc821ce58786addf  /track01_dbp_d1_o000.wav
43be2e2ecd260699165a874efaf7bcfc88c3e0a3c02f0b32fc821ce58786addf  /track01_eac_d1_o000.wav
43be2e2ecd260699165a874efaf7bcfc88c3e0a3c02f0b32fc821ce58786addf  /track01_fob_d1_o000.wav
43be2e2ecd260699165a874efaf7bcfc88c3e0a3c02f0b32fc821ce58786addf  /track01_itu_d1_o000.wav


:::Checksum SHA-2(256) internal PCM - Drive 2:::::::::::::::::::::::::::::::::::::::::::::::::::
f1f2cb11c121608ced3df64bca36051fe05e1638  /track01_dbp_d2_o000.wav
8fc5a1f4332d6b20c9dcfbdb74220ada9b84ee0c  /track01_dbp_d2_o701.wav

:::Checksum SHA-2(256) entire file - Drive 2::::::::::::::::::::::::::::::::::::::::::::::::::::
9ee0a2f83b2ebc5d9810bb4af343c9541b53c2e1462b4ea40e411e17eab759b2  /track01_dbp_d2_o000.wav
49eacbf1192931289158fbdd72ec56353d65aaa4086fa432eb161290049909e6  /track01_dbp_d2_o701.wav


:::Checksum SHA-2(256) entire file - FLAC en-/decode:::::::::::::::::::::::::::::::::::::::::::
43be2e2ecd260699165a874efaf7bcfc88c3e0a3c02f0b32fc821ce58786addf  /track01_cdp_d1_o000_flc0.wav
43be2e2ecd260699165a874efaf7bcfc88c3e0a3c02f0b32fc821ce58786addf  /track01_cdp_d1_o000_flc8.wav

Tools used: 1. shntool 2. sha256sum 3. flac
Appendix 2:
Tools used:


Ubuntu 11.01
Windows 7

Plextor Premium 1 (AR drive-offset : +30)
Toshiba SDR 5372V (AR drive-offset : +704)

CD Extractors:

1. EAC
2. dbPoweramp
3. iTunes (Windows 7)
4. Foobar
5. cdparanoia (Linux)


Via terminal commandline:

sha256sum (Linux) - 8.5 - 256bit sha-2 checksum on entire file

sha256sum <filename>

shntool (Linux) - 3.0.7 sha-1 checksum on PCM data only

shntool hash -s <filename>

Of course I've written a little program to run all tests automatically.


  1. How do you know what your drive offset is (/or should be) ?

  2. 1. You can have dbPoweramp or EAC look them up
    2. Or you look them up by yourself:

  3. Hi Klaus,
    have a look at This is my way to get the offset WITHOUT AR.
    More Work, but "Hands On".

  4. Hm... I always wondered, why audiophile enthusiasts are willing to spend 1000s of bucks for a "CD transport" to hook it up in front of a standalone DAC, while simple CD Rom Players seemed to be able to make 100% accurate CD rips even in much less time... However, I was happily ready to believe, that a 50$ DVD Burner would read CDs as accurately as a 2000$ CD transport, and that until now, no one had tried to stress my budget, trying to convince me of a "high end" CD ripping drive... Now there seem to be question marks behind all those assumptions.

    However, now, back to our study: Thanks a million for that, it clarifies a lot!!! 100% = 100% accurate... doesn't it? (not the same as comparison as the 100% volume dispute in the squeezebox though ;-) )

    What exactly would you suggest as settings for EAC to be sure that we get those 100% results?

    Thanks and best regards! Urs

  5. Regarding the testing of Stereo... I slightly disagree with the approach of just listening tests. What about producing a proper CD with a known test signal or a piece of Music you first had as a PCM Track, then burn it (or better produce a single CD). Then rip it and compare binary. This would more look like a proper comparison to me. How can somebody say this is better or worse than the other if you did not have an original? Or had they? I think i should re-read the article in Stereo again to check.

  6. If you like to compare the different results on my audio tools you are most welcome.
    Small differences are very easily to be detected.

  7. Hello Klaus,
    allow me to ask some questions on your testing method:

    1. According to appendix 1, you generated internal PCM checksums of files from drive 1 with SHA-1 and from drive 2 with SHA-2(256) instead. Any reasons for this change?
    2. In appendix 2, you mention an AR drive offset of the Toshiba of +704. The filenames indicate an offset of 701 instead. Which is correct?
    3. A general question: Have drive offsets (be it 30 samples more or less) really other effects on sound quality than some miliseconds of silence as desribed in this EAC manual:

    Beside from that, a very interesting and enlightening article, thank you for that.

    Regards, Harald

  8. Hm, this is all the feedback you got? It's quite a pity.
    Do you have any news on this issue, e.g. does "they" have a bug with a offset mistake of 30?
    If I've finnished my new audio system, I'll remember your blog posting and do a comparison between audio files with and without a offset failure!
    Kind regards,

  9. I wish you hadn't told me about the depressing +30 offset figures returned by (in)Accurate Rip. What a travesty. To try to establish a norm based on a deviation, is merely swapping one offset for another. How daft is that?