Search This Blog

Loading...

Wednesday, October 10, 2012

The silent death...

Hi folks.


Do you sometimes have the feeling, of not feeling comfortable the way you handle and administrate your valuable music ( and video and image) collection?

You don't really know -- if all the audio file integrity is still given  (after all those years of moving files here and there) !?!?






Many of us spent hundred of hours to shape their collection, rip it, rip it twice,
get the tags right, finding the right cover arts and do this and that. ( I know there  are other people out there who select their tracks by filenames such as 1.wav  - these people can stop reading at this point ;) )


Once you're finshed (if that ever happens - a collection is a living animal) with your collection you do a backup.

Usually most of the people just simply copy/paste (using MS explorer) the stuff from one disk to another.

Over time that'll be done many times. You keep copying your data back and forth.

You trust  the OS (operating system) that it takes care on the consistancy and won't mess with the data.

There are others who're running RAID systems and feel save about doing it.( It can't get worse than that)


The more sophistcated users use backup tools and/or secure copy tools such as Teracopy .
Most of the users out there believe that this will be end of sorrows and discussion.


Fact is, not any of above systems will give you a waterproof solution of protecting your audio treasure.

Time is your enemy.

(As well as laziness and ignorance.)


You should always ask yourself  "Are the files still OK?"

And you'd better do something about it.

Ignoring that question might lead to loss of more data over time than you might expect.



I  also thought like -

"My OS backup tools take care on integrity checks etc. It'll work out. Keep the fingers crossed."


I  know NOW that this attitude is more than inaproppriate.

You really need to actively verify your assumption/believe that everything is gonna be fine.

  

During the last weekend I did an analysis.  I figured out that 30 files got corrupted. Over a couple of thousand tracks that's not much. Still we're talking about 30 corrupted CDs.

It really surprised me.  I thought all the time I'd be doing things right.
Obviously the way I handled my collection must have had some weak points.
I really underestimated the situation.


30 files. And guess what. Not any of them belonged to the same album.


Obviously I had to re-rip those CDs again. That's possible if you still own the CDs.
What if you downloaded your tracks??



Bottom line. I concluded that I need to improve my data handling process.

The goal is to make 100% (or close to it) sure that no data loss can occur.
The whole process must become an  automated process.
Because laziness and ignorance is creeping in sooner or later. I can't avoid it.
And I'm sure other people out there are not any different.


Perhaps those of you reading this until now, ask yourself
"How did he realize that data got corrupted???"



Data can get corrupted on many occasions. The typical copy/paste and backup routines try to avoid it. They are supposed to control overall integrity and ususally issue error or deviation messages if a problem is detected.
It's a known fact that e.g. Copy/Paste is not a save mechnism. Not any professional administrator would
use such a method.  Backups as done with better backup tools are quite save. Still. Over time
it still can happen that you overwrite clean backups with corrupted backups.

(Error-) Messages need to be looked at. I'm pretty sure that 98% of all people ignore those messages issued by backup/copy or convertering tools with - "I'll look at it later - no time yet".  Time and ignorance is our enemy.

And this way it happens that you get more and more corruption into your database over time.

Don't forget. Corruption of data can also occur very silently with aging/weared down hardisks, weired constellations, such as power outtage at the wrong moment asf.
Look at e.g. Amazon and elsewhere. People report of dieing HDDs after 6 months. I wouldn't trust a HDD that's older than 2-3 years.



What to do about it?


It's pretty simple - once you are aware of the problem. And now you are aware of it.

We just need to add some minor steps to the backup process - assuming you've got that one in place.


Beside just running your standard backups with rather sophisticated backup tools - and please - once more - a simple copy/paste won't cut it -  you'd need - on top of your backup process -

1. to run an initial verification and cleanup project first
    Make sure that the existing master and backup media are OK.
2. to check continously the integrity on a per file basis on your master HDD
    and on  your backup media - prior - and after a backup.
3. You need to do a 2nd backup - including integrity check

To accomplish above we need to go for an audio data format which comes with
an embedded integrity mechanism - a checksum.

 flac would be the prefered option.

.wav files are e.g NOGOS.


Let's go on with flacs.


Flacs come with an embedded md5 checksum. That checksum is generated over the audio data chunk only. Which is good. Having a little problem or a change in the tag area is not  ciritical and won't have an impact on the checksum mechanism. That checksum is written into the flac file.

That checksum will be renewed, as soon as you do some re-encoding or transcoding of the flac data.


If the flac codec detects a mismatch - while decoding - between the MD5 checksum and the decoded audio data chunk - it will issue a corruption message and stop decoding!!!.

That corruption message is KEY. You need to look for it and you have to fix the affected track on either of your storage devices.



How to introduce this to our "backup"  AND not to forget "restore" procedure?
(We don't want to restore a corrupted collection!) 


The flac(.exe) executable (you'll find it under windows and Linux) allows for identifing that checksum data mismatch and issues error mesages.
 
What you do is, you run a test-decoding ( flac -t <file> ) on all files on your master HDD and backup media prior and after any of your backups. (That check can easily take an hour on e.g  4.5k-5k tracks)
While doing the test-decoding , flac(.exe) will issue an error message if there's a problem with the checksum. You can write a batchfile under Windows or a script ( see Appendix II) under Linux.

There's also another tool you might use:

The dbPoweramp Reference converter tool offers a function called "Test Conversion". It basically does what the flac binary is doing if run with the -t option. You can also use that one. The dbPoweramp Batch converter tool allows to check your entire disk at once. That's most convenient.  dbPoweramp also offers a  feature called  Move Destination File on Error that moves the corrupted data automatically to a predefined target directory.




That's basically all we need to do.

Yep. That's about it.

A pretty simple measure - though very powerful. I'm sure my collection is kept much safer now.


I guess 99.9% of all collections out there got a problem. You might consider doing something about it.


If you follow my advise,  I'd be really interested to see you reporting back your corruption rate (and associated backup strategy). ;)

Good luck.

Cheers
\Klaus

##########################################

Appendix 1:

Backup advise:

1. Don't run simple Copy/Paste backups, use backup tools - look for recommended settings of these tools!!!
    Usually you'll find numerous parameters, which doesn't mean much to you in the beginning. Later on you
    usually,  realize, why these parameters were introduced. Spent a little time on the subject.
2. Introduce incremental backups, which keep the original data as long as possible and store the "delta" data
    at a different place on the same disk.
    This way you save quite some space. You can  have several backup cycles on one disk.
3. Use at least 2 backup disks - stored at different locations.
4. Buy disks which are used in the professional area. Cheap consumer stuff is not recommended
5. You don't need that fastest disks ( higher wear down effects). You'd need the most reliable.
6. Don't use old disks ( which e.g. were just replaced by your brandnew SSD) as backup media
7. Check the data integrity seperately - see above
8. Automate the process as much as possible and keep the logfiles.

Tools:

Full Backup ( all partitions and bootsector)
Windows 7 -  Backup and Restore
Acronis - The Free WD Version
dd (Linux - commandline)

and more

Incremental Backup
Windows 7 -  Backup and Restore
rsync (Linux and Windows - commandline - also remote via network! (ssh))

and more

Note: The W7 tools are meanwhile able to compete with other commercial software.
IMO there's no need to go for Norton or Acronis.

Secure Copy ( with CRC check and error logging on failed transfers)
Teracopy
rsync (Linux and Windows - commandline)


Advise:   It's always recommended to use a defragmented disk.
Copying data back and forth all the time, or any other jobs you run on your data, such as adding RG tags or similar, gets you lot of fragmentation on the disk.  That fragmentation is gone if you backup your disk to another disk. If you can swap your master disk with the backup disk easily, use your backup disk as master
This way you always run a rather defragmented disk, without running an annoying defragmenattion process -
which puts a lot of pressure on your HDD.


###############################################


Appendix 2:

Since I'm at home in the Linux world, I've written a simple one-liner that accomplishes the flac integrity check over your entire harddisk(s):

Open a terminal first.

**copy/paste below command into one line******

find / -iname "*flac" -print0 | while IFS= read -r -d '' "j" ; do   flac -s -t "$j"  2>>/tmp/flac-integrity.log ; done

************************************

That'll take some time (hours) - 1 to 2 s per file.

You can replace "find /" with e.g. "find /media/music" to specify a specific music directory.

The scan result you'll find in /tmp/flac-integrity.log

To test above you might copy a CD to /tmp first and then you replace  "find /" with e.g. "find /tmp"

You can run a:

grep "error"  /tmp/flac-integrity.log | wc -l

That'll tell you if any and how many problems where found.


6 comments :

  1. Hey Klaus,

    thanks for the headsup! Just checked my collection (203Gb, 14k files, 70% flac, 30% mp3) with dBpoweramp test. Only 3 files corrupted 1 of which I no longer need and 2 that I can rerip.

    Now I have to buy a external hdd so that I can do the backup^^. At the moment I just put a 2nd disc in my NAS, let it do its mirror job and take the 2nd back out...

    I stumbled upon your blog via diyaudio. Funny thing is I run a Squeezebox Touch as well and just 5 days ago ordered the DDX320v2 :D

    So you will be hearing from me in the near future about your mods which I am looking forward to doing and specificaly about how you have hooked up your SBT to the DDX.

    Schöne Grüße
    Ragnar

    ReplyDelete
  2. Great bit of info there — I wasn't aware that flacs had the md5 sum embedded and were so easy to test. Monthly cron job duly set up. Cheers!

    James

    ReplyDelete
  3. Hiya, cheers for the above, somewhat a beginner at pc stuff so will read through again and give it a bash :-) as i am slowly realizing the source file is rather important!

    I have always mirrored my master every day music hdd to back it up, so all the bits stay the same (or so one thought)

    Would say a RAID5 back up, play with the bits of a FLAC file, or would it stay the same and just spread out more ?

    cheers for any thoughts
    thanks again
    Mark

    ReplyDelete
  4. Hi,
    I got curious about this and tried it for myself with dbamp. Out of 39'000 flac files, found 4 corrupted.
    Re-ripped one of them as flac and tested it ok with dbamp.
    I then compared the wav version of the corrupted file with the wav of re-ripped file (both obtained with flac frontend) and EAC (compare wav) found them to be identical. Does this mean that corruption originated from the flac tags of the data?

    ReplyDelete

  5. The wav should have the same size as a decoded flac.

    To compare two file though the md5sum of both files have to be compared.

    The files can have the same size but can differ on the content.

    If a flac is broken - there's a mismatch of the embedded and stored md5sum and the just generated md5sum of
    the audio content.

    ReplyDelete
  6. Hi Klaus,
    after reading this blog, I now am AWARE of the corruption problem as well, but what can I do, if all my musical treasure is ripped (via dbPoweramp) to aif, ´cause I am living in the appleworld?

    By the way,thanx for all that absorbing blogstuff and DETAILED explanations, always fun to read.

    Holger

    ReplyDelete