Snapshots in VMware and how to survive them.

vmwareWe survived a rather scary day with one of our main file servers on Sunday.  This server was one that we had only virtualized several weeks before and contains a large amount of critical data. I’m a huge fan of VMware and their products have transformed the way we do IT at Watermark, but yesterday was not fun.

Sunday morning I received a call that the server wasn’t responding, and on further review noticed that the server’s data store was completely out of space.  The server would start for a few minutes, but then error with “There is no more space in the redo log for servername-00002. You may be able to continue by freeing disk space on the relevant partition.”  This was the beginning of our lesson on VMware snapshots.  ***side note, we have Gold Support for VMware, which you think would be good, but no… if you want support outside the hours of 6am-6pm M-F, you need platinum support… Nice***  But I digress.

Last Saturday we had taken a snapshot which we had subsequently forgotten about.  When you take a snap in VMware, the system puts the original VMDK (virtual disk file) into a “holding pattern” and begins to write changes to a new virtual disk file, in our case the servername-00002 file.  The best practice of course, is to do a snapshot, make your changes, and then immediately delete the snapshot; at which point all of the changes will be written back into the original “holding pattern” VMDK and all is well.  Unfortunately, the system doesn’t do anything to remind you that the snap still exists if you forget to do this.  At the time of our discovery, the new 00002 file had grown to the size of 21 gigabytes and had filled up all of the available disk space.  This to me seems like something VMware should implement, a reminder that snaps are growing like crazy and about to take you out at the knees.

So our immediate course of action was to keep the server stopped (it wouldn’t run for more than a few minutes anyway before falling over), and get a complete copy of our file system from the SAN as a backup.  After copying nearly 60 gig from the SAN to a different location, it was time to attempt removal of the snapshot.

We went into Virtual Center, under Snapshots, and snapshot manager and saw the snapshot from last Saturday that we wanted to remove, and promptly removed it.  The task started and then hung at 95% for about fifteen minutes, at which we received a message that the “Operation Timed Out.”

Now, here is the kicker. You would think that a message like that would be a prompting to try again, but after lots of research it appears that the process has not really timed out at all.  Because the “tracking changes” VMDK is so large, it has lots of data to roll back into the original.  So in reality, the process is still running in the background and you just need to give it time to finish.  In fact, many people have said that reissuing the “remove snapshot” command will in-fact kill your data.  Not good VMware.

Fortunately, we found this information out before trying to remove the snap again.  Surely enough, two hours later, the process finished running and we were back to our original VMDK file.  The server started up with plenty of storage available again and all is well.

Like I said earlier, I absolutely love VMware, but I will not be using the snapshot capability in the future.  I think I will stick with EqualLogic snapshots, which seem to be faster AND safer.

I’d love to hear your comments on what we did right/wrong and how VMware has worked for you.

Service Day at Watermark

img_0003Yesterday, several thousand folks from Watermark Community Church took the day off from “going to church” so that we could “be the church” to the city of Dallas.  We boarded school buses until they were all full, and then the rest of us drove to schools all over Dallas to serve.

My oldest daughter and I headed out to Hamilton Park Elementary where we joined an army of volunteers.  We cleaned desks, washed windows, cleaned marker boards, picked up and emptied trash… all while having a great time with other folks that we are connected with at Watermark.  It was great to see the body of Christ come together.  You can read more at the Dallas News article or read people’s stories at the Watermark External Focus blog.

Twitter updates for the week of 2009-03-29

  • @sgoodger not if I can avoid it. That is why there are other stores in the mall. Dare you to ask to try something on…. in reply to sgoodger #
  • @GregAtkinson Along with the rest of the universe… can’t win if you are running in the middle of the pack… (also can’t lose) #unc in reply to GregAtkinson #
  • and with Louisville, there goes my bracket… #
  • http://twitpic.com/2kmte – Had so many people show up for service day that we ran out of buses. Headed out by car. #
  • Wonders if lots of babies are conceived when people turn the lights out for Earth Hour, is it really counterproductive? #
  • Dear Microsoft and ATI. Upgrading a video card driver on Vista should not require an IT or engineering degree. K? Thks. Bye. #fb #
  • Girls are playing with their male cousin. He wants to play Power Rangers. They have no concept. Morgan’s superpower is bubble blowing. #
  • Great way to spend the day. http://post.ly/Bvy #
  • @LesBrown very in reply to LesBrown #
  • @clintmiller Accuvant. Those are my boys… Tell Shawn Bowman I said hello. I’ve used them for Aruba, extreme, juniper. in reply to clintmiller #
  • @clintmiller I have a few great Juniper resellers if you are interested. in reply to clintmiller #
  • Trying out the new posterous site instead of Twitpic. So far it rocks! http://post.ly/BgJ #
  • http://twitpic.com/2hj5h – Lots of memory slots full of memory for vmware. Room to grow now. #
  • http://twitpic.com/2hf4v – @paulrhoades with lots of memory for ESX. #
  • @LesBrown 2000 calories of bliss. :) in reply to LesBrown #
  • Got virtualcenter server FINALLY upgraded to ver 4 today. Tonight, doubling the memory in the esx servers. #
  • @mcfchr looks like a great package for the price (macheist) in reply to mcfchr #
  • wondering if Shelby support could be worse… thinking no. #
  • Hanging at watermark with the community group guys. #
  • @robwthomas I can’t believe you have tracked the number of gallons of milk. That is staggering. in reply to robwthomas #
  • RT @cyberentomology: wondering if the president’s circular answers change direction south of the equator… Pretty funny. #
  • @Trent_Armstrong I got stuck in a meeting and didn’t get to come down. How did the technology/wireless network hold up? in reply to Trent_Armstrong #
  • Two ways to be a failure. Being too weak of an individual to help a team or being too strong of an individual to be part of a team. #
  • @LesBrown funny movie. Girls loved it. in reply to LesBrown #
  • thinks it IS tricky to rock a rhyme that’s right on time… #fb #

Great way to spend the day.

Having a family gathering at your place is always a great excuse to grill. Even when it is a little chilly outside.

Posted via email from Scott’s posterous

Trying out the new Posterous service instead of Twitpic. So far it rocks!

I’ve been a big fan of Twitpic for a while, using it to email pictures that will update my twitter status, etc.  But today, I got word of a new service (new to me anyway) called Posterous.  It not only is very easy to use, but you simply send an email to post@posterous.com and attach ANYTHING (photos, mp3s, whatever) and it will be autoupdated wherever you want… twitter, facebook, flickr, blog, the list goes on and on.  I’m going to give it a shot and see how it goes.  It’s a free service.  To sign up, all you have to do is send an email to post@posterous.com and they will reply back with your info.  Check out my stuff @ watermarkgeek.posterous.com

Posted via email from Scott’s posterous