Home > Articles > Disaster Recovery for Exchange Bookmark page

Disaster Recovery for Exchange Server in 30 minutes!

  • Introduction
  • The “Official Microsoft method” for Exchange disaster recovery
  • The “fast way” for Exchange disaster recovery
    • Exchange Server design
    • Disk imaging methodology
    • Backup methodology
    • Recovery procedure
  • Conclusion


Introduction
What would be your reaction if I told you I intended to format the C partition on your production Microsoft Exchange server to run a quick test?  Your first reaction would probably be to grab the nearest blunt object for use on my head before I even got a chance to explain my self.  Unfortunately, this is no joke and the dire consequences of viruses, incompatible patches, or other malicious events on a companies Exchange server is all too often the worst living nightmare of any Exchange admin and their bosses.  In this article, I’ll show you how to protect your Exchange server with a method that recovers an Exchange server in 30 minutes!  Some of you at this point may wonder, “But I backup my exchange database, why do I even care about this?”  Unfortunately, an Exchange database is useless with out the original Exchange server in the original domain environment.  If anything bad happened to the OS or Exchange application where the server was not recoverable, the data would be worthless because you can’t just mount that database on any old exchange server.  Microsoft Exchange recovery is one trickiest things to execute, this article is meant to make that as painless and reliable as possible.
 
 
The “Official Microsoft method” for Exchange disaster recovery
If you’ve ever hired a Microsoft certified consultant or worked in a large corporation running Exchange, you’ve probably heard of or are using the “official Microsoft” method for Exchange disaster recovery.  Unfortunately, anyone thinking about or running this “official” method is a glutton for punishment.  The method involves maintaining a mirrored parallel universe of your Domain/AD and Exchange infrastructure.  It involves building a backup domain controller on your existing network, and then putting it in an isolated LAN and promoting it to the PDC.  Then you must carefully and meticulously build an Exchange server on that isolated LAN from scratch without making a single mistake in spelling using the identical settings of your production environment.  Only then can this parallel universe be used in the event of a catastrophic failure of your production Exchange environment.  This procedure is complex, prone to error, and expensive.  Needless to say and without getting further into the details of this clumsy formal methodology, you don’t want to go down this road because there is a better way.
 
Background note:  I’ve personally seen a high paid consultant spend 2 weeks implementing and documenting this disaster recovery plan for my company.  In the end, that same consultant could not replicate that environment using the documentation he wrote him self and finally gave up after a day.
 
 
The “fast way” for Exchange disaster recovery
As a result of my personal experiences above, I refused to believe that this is what I must live with in order to have Exchange recoverability.  Believe me, I got plenty of flack for it from my collogues and the very same consultant who couldn’t read his own documentation and in turn had expected us to use in case of a disaster.  They insisted that this is the “official” way and is how all the big corporations do it.  At the time in year 2000, I was just beginning to use system imaging (a process of coping an entire hard drive or partition on to a single file called an image) to begin a large deployment of new workstations running Windows 2000 and all new applications.  I soon began to wonder, why not use system imaging for servers, and Exchange in particular.  It seemed a daunting challenge because these were high end servers running complex SCSI RAID 5 configurations and it seemed like imaging wouldn’t work.  As it turns out, since hardware RAIDs are completely transparent to the OS and applications, Norton Ghost or Power Quest Disk Image worked just as well on servers as they did on workstations.  Once I managed to get this working, I managed to build a test Domain and Exchange sever in which I took an image of the OS and Applications partition and managed to restore it in less than 30 minutes after I formatted the C drive (data resided else where).  I was confident that this would work to restore the basic Exchange server, but I wondered if the re-imaged Exchange server would recognize a more recent database if some additional emails were sent after the system image was taken.  I tried this in the lab and indeed it was possible, the image-restored Exchange server would even mount a newer database with more recent data.  What this meant was, even if I had made a month’s worth or any amount of updates to the database by just everyday usage, the Exchange server would have come up and brought up all the old and new data.  I realized had come across the ultimate disaster recovery procedure for servers and all that were needed was some refinement in the process.
 
The following refinements are what I came up with:
  • Exchange Server design
  • Disk imaging methodology
  • Backup methodology
  • Recovery procedure

Exchange Server architecture
To make this recovery method feasible, a fundamental design must be followed on the Exchange Server.  Data must reside on a separate partition (physical preferred but logical is ok) from the OS and Application partition.  The last thing you want to image is 1 Gigabyte of OS/Apps plus 100 Gigabytes of data on a single partition.  You loose the granularity and convenience of being able to recover OS and Applications without affecting the Data.  For existing servers that already have data mixed in with the OS and Applications, you could add an additional storage device and move the data store and log files to separate partitions on the newly added device.  Ideally, one would go by the following guidelines for maximum safety, scalability, manageability and performance.
 
  • Put the OS and Applications on the C partition.
  • Log files (AKA transaction logs) on D partition.
  • DO NOT put log files on same physical device as the Exchange Data Store (Database), if that device failed and you lost the database, you would also loose the ability to recover data made after the tape backup from the Exchange transaction logs.  I’ve seen bad designs that do this and they paid dearly for it loosing a whole day’s worth of data because the storage device they housed the data store and log files on failed.  Their tape backups managed to restore everything up to the previous night, but loosing a day’s worth of emails in a large corporation is potentially a “resume producing event”.
  • Put the Data store on it’s own physical drive or block-level storage device such as a Fiber Channel or iSCSI (new IP based storage standard) SAN (Storage Area Network).  Better yet, use Exchange 2000 and break it up into multiple chunks along departmental lines and put them on separate physical RAID partitions.  The more separate physical partitions you use, the faster it performs.  This is due to the fact that a RAID partition can only seek one thing at a time without jumping around at a heavy cost to performance.  Two physical devices can seek two things at once without jumping back and forth between two tasks, and every physical partition you add gives you another simultaneous data seeker giving you a linear gain in performance.
  • In practice, this could easily and cheaply be accomplished on typical midsize servers with hardware RAID and six drives.  Three pairs of drives can be configured for RAID 1 mirroring to create three physical partitions.  The first partition could be broken down in to logical C and D partitions for OS/Apps and Log files respectively, and the second two physical partitions can be set as the E and F partition for housing multiple data stores.  Higher end solutions can use external SAN based solutions using a similar approach in RAID setup, this opens the possibility of clustering your Exchange 2000 server.  Do not just lump all 6 drives into one massive RAID 5 array and chop it up into multiple logical partitions, this method is cheaper because only one drive is lost to redundancy but it is at a horrible expense to seek time (three fold loss in seek times to be exact because all 6 drives must act in unison).  Hardware and Drives are so cheap now that it isn’t worth the savings.  In general, database applications are best described as death by a thousand tiny requests, simultaneous seeks capability is heavily favored over the improved sequential transfer rates that RAID 5 offers.
  • For Exchange 2000, put three data stores on the E partition and three data stores on the F partition.  Doing this makes recoverability and maintenance a snap.  If you put one large data store on one partition, you cannot safely use more than half the space on that partition.  If you need to compact a database during maintenance or if you needed to do a database repair in the event one of your data stores got corrupted, the minimum free space on that partition needed must be equal to the size of the data store you are compacting or repairing.  Three data stores on one partition means you would only need to reserve 25% of the partition for maintenance or repair jobs.  Database corruptions are common with any sort of database, having 6 data stores makes repairs 6 times faster and easier because the stores are 6 times smaller.  Additionally, the 5 remaining stores are not affected when the 6th data store is being compacted or repaired, allowing zero down time for most of the company.  I highly recommend Exchange 2000 or up because of this feature.

 
Disk imaging methodology
Before you start, be sure your Exchange server is in full operational order with all database store structure, anti-virus, backup agents, and anything else installed.  In order to image a system, you generally need a dump your image onto a separate physical partition from DOS mode, you cannot image a boot partition while that partition is loaded.  The easiest way to do this is to dump an image to a network file share. To avoid writing an entire 10 page chapter on how to create a TCP/IP network boot disk with SMB client capabilities and save you a ton of time, I’m going to say just one word; bootdisk.com.  www.bootdisk.com is the one stop place you can freely download pre-made bootable images that pretty much work with all common network adapters.  From there, you only need to make a few minor modifications to the drive mapping batch file to mount your network drives onto a drive letter.  I would then recommend that you make a bootable CDROM image of the modified floppy disk for vastly improved boot times.  Then you would simply boot up the CD with the network drivers and automatically map the network share.  That network share should also contain a recent copy of Norton Ghost or PQDI (Power Quest Drive Image).  From there, you simply run Ghost or PQDI and dump the C partition onto the network share.  Additionally, I would create an image backup of all the Log file and Data Partitions as well with just the database structure and no data.  Although imaging the data partitions is not mandatory, it will save you a lot of trouble by recreating the entire partition and database directory structures when doing a bare metal restore (starting from scratch with a new piece of identical or flushed hardware) of your Exchange Server.  Note:  The bare bone database structure partition images will be very small because they are almost all compressible.
 
With newer versions of Ghost or PQDI, they support a new hot backup feature where new changes to the OS and Application could be tracked while the system was operational.  Note that you must first create an initial image of the C partition in DOS mode.  Then you would track any changes to a production server even while the system is running by backing up all deltas to the OS and Applications with this new feature.  This is a valuable feature because it is not feasible to down your server just to do a cold image backup every other week.  Once all of this is in place, you can put your exchange server into production and start populating the Exchange data stores.
 
Backup methodology
Up to this point, everything has been focused on backing up the OS and Application partition using disk imaging software.  Backing up the data is equally important for an Exchange server.  For anyone serious about performing hot backups of an Exchange server, you must use a reliable 3rd party enterprise solution that hooks into Exchange with a backup and restore agent.  One of the better solutions I have seen is from Legato.  Some other solutions that I have worked with were utter nightmares and never worked consistently and couldn’t always successfully restore, which meant someone’s head was going to roll.  Since this article is not specifically about data backups, I will move on.  For the large enterprises that can’t even afford to loose an hour’s worth of data, I would recommend that you go the extra mile of continuously making copies of the log files onto a separate physical device with some automated hourly batch process.  Tape backup covers you up to the previous night and log files can cover you up to the last few minutes just before a database disaster.
 
Recovery procedure
There are several types of disasters that can happen to an Exchange Server such as database corruption and server corruption of the OS or Application.  In some cases you may be able to recover from a database corruption by running the repair operation on the affected data store or calling Microsoft and finding a way to fix a severe OS or Application error on the Exchange server.  However, in the even that the that a system completely dies where neither of the above options work or you simply loose an entire machine including all of the OS, Applications, Log files, and Data due to some catastrophic disaster, following the guidelines mentioned above is what is going to save you.  In the even of a catastrophe, you would run the following procedures.
 
  • If Exchange server can’t boot, or the Exchange Services refuse to load normally, proceed to the following steps.
  • Boot system with DOS disk with network support (same one you used to make image of C drive).
  • Load the C image with the image of the good exchange server from the network share where you keep all server image backups onto the corrupted C partition and reboot when complete.
  • If you were maintaining hot update backups with Power Quest Delta Deploy or Norton’s Ghost, apply those updates to your OS and Applications.  Reboot if needed.
  • If you had not lost the data partition or corrupted the database, then all should be well at this point.  The Exchange server should recognize and mount the database automatically.  This whole process up to this point can be done in 30 minutes.
  • If you lost the data as well, or you are doing a bare metal restore, you need to re-image the data partitions with the bare data structure images.  If you didn’t bother to do that, you will need to create the identical partitions and directory structure manually which may be very difficult.  (If you had put the data on a separate physical device, the odds of this happening simultaneously with an OS/App failure are slim).  Reboot if you re-imaged that log and database partitions.
  • Once rebooted, your Exchange server will be fully operational with an empty database.
  • Invoke the data recovery application and proceed to recover data from tape.
  • If you were wise enough to have copied the log files to another physical server/device before the disaster, now would be the time to copy those log files back and apply those logs to recover data up to the last couple of minutes before the Exchange server crashed.

 
Note, it is rare that you would need to do a bare metal recovery of an Exchange server, but following the above procedure ensures you have maximum recoverability of your business critical Exchange servers in any event.  On a last note, you should keep an off site tape copy of your images and database, this goes for any other critical server or application.


Conclusion
The above procedure is the easiest and most reliable of way of recovering from an Exchange disaster.  However, you should note that the concepts apply to any other server or application.  Some may ask “but Microsoft doesn’t support this do they?” or “Microsoft does not support disk imaging”.  The truth is, I use disk imaging on all my servers and I have never been turned down for support at Microsoft, nor do they even ask if you using disk imaging at all.  I even maintain a library of generic system images of every type of system configuration I have so that I can deploy or redeploy any new server rapidly and have never been turned down for support.  Microsoft support is one of the more reasonable solutions out there, where else can you get unlimited attention until an issue is resolved for $250.  Additionally, Microsoft has already announced their own image deployment strategy with a new API called ADS (Automated Deployment Services) in which major players have already pledged their support.  Bottom line, disk imaging makes eminent sense and can be applied to any type of server or workstation.