Replica Clusters behind a NAT

October 10, 2013, 11:06 pm

≫ Next: What’s new in Hyper-V Replica in Windows Server 2012 R2

When a Hyper-V Replica Broker is configured in your DR site to accept replication traffic, Hyper-V along with Failover Clustering intelligently percolates these settings to all the nodes of the clusters. A network listener is started in each node of the cluster on the configured port.

While this seamless configuration works for a majority of our customers, we have heard from customers on the need to bring up the network listener in different ports in each of the replica server (eg: port 8081 in R1.contoso.com, port 8082 in R2.contoso.com and so on). One such scenario is around placing a NAT in front of the Replica cluster which has port based rules to redirect traffic to appropriate servers.

Before going any further, a quick refresher on how the placement logic and traffic redirection happens in Hyper-V Replica.

1) When the primary server contacts the Hyper-V Replica Broker, it (the broker) finds a replica server on which the replica VM can reside and returns the FQDN of the replica server (eg: R3.contoso.com) and the port to which the replication traffic needs to be sent.

2) Any subsequent communication happens between the primary server and the replica server (R3.contoso.com) without the Hyper-V Replica Broker’s involvement.

3) If the VM migrates from R3.contoso.com to R2.contoso.com, the replication between the primary server and R3.contoso.com fails as the VM is unavailable on R3.contoso.com. After retrying a few time, the primary server contacts the Hyper-V Replica Broker indicating that it is unable to find the VM on the replica server (R3.contoso.com). In response, the Hyper-V Replica broker looks into the cluster and returns the information that the replica-VM now resides in R2.contoso.com. It also provides the port number as part of this response. Replication is now established to R2.contoso.com.

It’s worth calling out that the above steps happen without any manual intervention.

In a NAT environment where port-based-address translation is used (i.e traffic is routed to a particular server based on the destination ports) the above communication mechanism fails. This is due to the fact that the network listener on each of the servers (R1, R2,..Rn.contoso.com) comes up on the same port. As the Hyper-V Replica broker returns the same port number in each of it’s response (to the primary server), any incoming request which hits the NAT server cannot be uniquely identified.

Needless to say, if there is an one to one mapping between the ‘public’ IP address exposed by the NAT and the ‘private’ IP address of the servers (R1, R2…Rn.contoso.com), the default configuration works fine.

So, how do we address this problem – Consider the following 3 node cluster with the following names and IP address: R1.contoso.com @ 192.168.1.2, R2.contoso.com @ 192.168.1.3 and R3.contoso.com @ 192.168.1.4.

1) Create the Hyper-V Replica Broker resource using the following cmdlets with a static IP address of your choice (192.168.1.5 in this example)

$BrokerName = “HVR-Broker”Add-ClusterServerRole -Name $BrokerName –StaticAddress 192.168.1.5Add-ClusterResource -Name “Virtual Machine Replication Broker” -Type "Virtual Machine Replication Broker" -Group $BrokerNameAdd-ClusterResourceDependency “Virtual Machine Replication Broker” $BrokerName Start-ClusterGroup $BrokerName

2) Hash table of server name, port: Create a hash table map table of the server name and the port on which the listener should come up in the particular server.

$portmap=@{"R1.contoso.com"=8081; “R2.contoso.com"=8082; "R3.contoso.com"=8003, “HVR-Broker.contoso.com”=8080}

3) Enable the replica server to receive replication traffic by providing the hash table as an input

Set-VMReplicationServer -ReplicationEnabled $true -ReplicationAllowedFromAnyServer $true-DefaultStorageLocation "C:\ClusterStorage\Volume1"-AllowedAuthenticationType Kerberos-KerberosAuthenticationPortMapping $portmap

4) NAT Table: Configure the NAT device with the same mapping as provided in the enable replication server cmdlet. The below picture is applicable for a RRAS based NAT device – similar configuration can be done in any vendor of your choice. The screen shot below captures the mapping for the Hyper-V Replica Broker. Similar mapping needs to be done for each of the replica servers.

5) Ensure that the primary server can resolve the replica servers and broker to the public IP address of the NAT device and ensure that the appropriate firewall rules have been enabled.

That’s it – you are all set! Replication works seamlessly as before and now you have the capability to reach the Replica server in a port based NAT environment.

↧

What’s new in Hyper-V Replica in Windows Server 2012 R2

October 22, 2013, 5:03 am

≫ Next: Online resize of virtual disks attached to replicating virtual machines

≪ Previous: Replica Clusters behind a NAT

18th October 2013 marked the General Availability of Windows Server 2012 R2. The teams have accomplished an amazing set of features in this short release cycle and Brad’s post @ http://blogs.technet.com/b/in_the_cloud/archive/2013/10/18/today-is-the-ga-for-the-cloud-os.aspx captures the investments made across the board. We encourage you to update to the latest version and share your feedback.

This post captures the top 8 improvements done to Hyper-V Replica in Windows Server 2012 R2. We will be diving deep into each of these features in the coming weeks through blog posts and TechNet articles.

Seamless Upgrade

You can upgrade from Windows Server 2012 to Windows Server 2012 R2 without having to re-IR your protected VMs. With new features such as cross-version live migration, it is easy to maintain your DR story across OS upgrades. You can also choose to upgrade your primary site and replica site at different times as Hyper-V Replica will replicate your virtual machines from a Windows Server 2012 environment to a Windows Server 2012 R2 environment.

30 second replication frequency

Windows Server 2012 allowed customers to replicate their virtual machines at a preset 5minute replication frequency. Our aspirations to bring down this replication frequency was backed by customer’s asks on providing the flexibility to set different replication frequencies to different virtual machines. With Windows Server 2012 R2, you can now asynchronously replicate your virtual machines at either 30second, 5mins or 15mins frequency.

Additional Recovery Points

Customers can now have a longer retention with 24 recovery points. These 24 (up from 16 in Windows Server 2012) recovery points are spaced at an hour’s interval.

Linux guest OS support

Hyper-V Replica, since it’s first release has been agnostic to the application and guest OS. However certain capabilities were unavailable on non-Windows guest OS in it’s initial avatar. With Windows Server 2012 R2, we are tightly integrated with non-Windows OS to provide file-system consistent snapshots and inject IP addresses as part of the failover workflow.

Extended Replication

You can now ‘extend’ your replica copy to a third site using the ‘Extended replication’ feature. The functionality provides an added layer of protection to recover from your disaster. You can now have a replica copy within your site (eg: ClusterA->ClusterB in your primary datacenter) and extend the replication for the protected VMs from ClusterB->ClusterC (in your secondary data center).

To recover from a disaster in ClusterA, you can now quickly failover to the VMs in ClusterB and continue to protect them to ClusterC. More on extended replication capabilities in the coming weeks.

Performance Improvements

Significant architectural investments were made to lower the IOPS and storage resources required on the Replica server. The most important of these was to move away from snapshot-based recovery points to “undo logs” based recovery points. These changes have a profound impact on the way the system scales up and consumes resources, and will be covered in greater detail in the coming weeks.

Online Resize

In Windows Server 2012 Hyper-V Replica was closely integrated with the various Hyper-V features such as VM migration, storage migration etc. Windows Server 2012 R2 allows you to resize a running VM and if your VM is protected – you can continue to replicate the virtual machine without having to re-IR the VM.

Hyper-V Recovery Manager

We are also excited to announce the paid preview of Hyper-V Recovery Manager (HRM)(http://blogs.technet.com/b/scvmm/archive/2013/10/21/announcing-paid-preview-of-windows-azure-hyper-v-recovery-manager.aspx). This is a Windows Azure Service that allows you to manage and orchestrate various DR workflows between the primary and recovery datacenters. HRM does *not* replicate virtual machines to Windows Azure – your data is replicated directly between the primary and recovery datacenter. HRM is the disaster recovery “management head” which is offered as a service on Azure.

↧

Online resize of virtual disks attached to replicating virtual machines

November 14, 2013, 2:30 am

≫ Next: Upgrading to Windows Server 2012 R2 with Hyper-V Replica

≪ Previous: What’s new in Hyper-V Replica in Windows Server 2012 R2

In Windows Server 2012 R2, Hyper-V added the ability to resize the virtual disks attached to a running virtual machine without having to shutdown the virtual machine. In this blog post we will talk about how this feature works with Hyper-V Replica, the benefits of this capability, and how to make the most of it.

Works better with Hyper-V Replica

There is an obvious benefit in having the ability to resize a virtual disk while the VM is running – there is no need for downtime of the VM workload. There is however a subtle nuance and very key benefit for virtual machines that have also been enabled for replication – there is no need to resync the VM after modifying the disk, and definitely no need to delete and re-enable replication!

There is some history to this that needs explaining. Starting with Windows Server 2012, Hyper-V Replica provided a way to track the changes that a guest OS was making on the disks attached to the VM – and then replicated these changes to provide DR. However the tracking and replication was applicable only to running VMs. This meant that when a VM was switched off, Hyper-V Replica had no way to track and replicate any changes that might be done to the virtual disks outside of the guest. To guarantee that the replica VM was always in sync with the primary, Hyper-V Replica put the virtual machine into “Resynchronization Required” state if it suspected that the primary virtual disks had been modified offline.

So in Windows Server 2012, the immediate consequence of resizing your disk offline is also that the VM will go into resync when started up again. Resyncing the VM could get very expensive in terms of IOPS consumption and you would lose any additional recovery points that were already created.

Naturally, we made sure that it all went away in the Windows Server 2012 R2 release - no workload downtime, no resync, no loss of additional recovery points!

Making it happen – workflows for replicating VMs

The resize of the virtual disks need to be done on each site separately, and resizing the primary site virtual disks doesn’t automatically resize the replica site virtual disks. Here is the suggested workflow for making this happen:

On the primary site, select the virtual disk that needs to be resized and use the Edit disk wizard to increase/decrease the size of the disk. You can also use the Resize-VHD PowerShell commandlet. At this point, replication isn’t really impacted and continues uninterrupted. This is because the newly created space shows up as “Unallocated”. That is, it has not been formatted and presented to the guest workload to use, and so there are no writes to that region that need to be tracked and replicated.
On the replica site, select the corresponding virtual disk and resize it using the Edit disk wizard or the Resize-VHD PowerShell commandlet. Not resizing the replica site virtual disk can cause replication errors in the future – and we will cover that in greater detail.
Use Disk Management or an equivalent tool in the guest VM to consume this unallocated space.

Voila! That’s it. Nothing extraordinary required for replicating VMs. Sounds too good to be true? Well, it is :). In fact, you can automate steps 1 and 2 using some nifty PowerShell scripting.

param (    [string]$vmname  = $(throw "-VMName is required"),    [string]$vhdpath = $(throw "-VHDPath is required"),    [long]$size   = $(throw "-Size is required")) #Resize the disk on the primary siteResize-VHD -Path $vhdpath -SizeBytes $size -Verbose $replinfo      = Get-VMReplication -VMName $vmname$replicaserver = $replinfo.CurrentReplicaServerName$id            = $replinfo.Id$vhdname       = $vhdpath.Substring($vhdpath.LastIndexOf("\")) #Find the VM on the replica site, find the right disk, and resize itInvoke-Command -ComputerName $replicaserver -Verbose -ScriptBlock {    $vhds = Get-VHD -VMId $Using:idforeach( $disk in $vhds ) {if($disk.Path.contains($Using:vhdname)) {            Resize-VHD -Path $disk.Path -SizeBytes $Using:size -Verbose        }    }}

Handling error scenarios

If the resized virtual disk on the primary is consumed before the replica has been resized, then you can expect the replica site to throw up errors. This is because the changes on the primary site cannot be applied correctly on the replica site. Fortunately, the error message is friendly enough to put you on the right track to fixing it: “An out-of-bounds write was encountered on the Replica virtual machine. The primary server VHD might have been resized. Ensure that the disk sizes of the Primary and Replica virtual machines are the same.”

The fix is just as simple:

Resize the virtual disk on the Replica site (as was meant to be done).
Resume replication on the VM from the Primary site – it will replicate and apply pending logs, without triggering resynchronization.

A similar situation will be encountered if the VM is put into resync after the resize operation. The resync operation will not proceed as the two disks have different sizes. Ensuring that the Replica disk is resized appropriately and resuming replication will be sufficient for resynchronization to continue.

Nuances during failover

If you keep additional recovery points for your replicating VM, there are some key points to be noted:

Expanding a virtual disk that is replicating will have no impact on failover. However, the size of the disk will not be reduced if you fail over to an older point that was created before the expand operation.
Shrinking a virtual disk that is replicating will have an impact on failover. Attempting to fail over to an older point that was created before the shrink operation will result in an error.

This behavior is seen because failing over to an older point only changes the content on the disk – and not the disk itself. Irrespective, in all cases, failing over to the latest point is not impacted by the resize operations.

Hope this post has been useful! We welcome you to share your experience and feedback with us.

↧

Upgrading to Windows Server 2012 R2 with Hyper-V Replica

December 2, 2013, 7:46 am

≫ Next: Hyper-V Replica: Extend Replication

≪ Previous: Online resize of virtual disks attached to replicating virtual machines

The TechNet article http://technet.microsoft.com/en-us/library/dn486799.aspx provides detailed guidance on migrating Hyper-V VMs from a Windows Server 2012 deployment to a Windows Server 2012 R2 deployment.

http://technet.microsoft.com/en-us/library/dn486792.aspx calls out the various VM migration techniques which are available as part of upgrading your deployment. The section titled “Hyper-V Replica” calls out further guidance for deployments which have replicating virtual machines.

At a very high level, if you have a Windows Server 2012 setup containing replicating VMs, we recommend that you use the cross version live migration feature to migrate your replica VMs first. This is followed by fix-ups in the primary replicating VM (eg: changing replica server name). Once replication is back on track, you can migrate your primary VMs from a Windows Server 2012 server to a Windows Server 2012 R2 server without any VM downtime. The authorization table in the replica server may require to be updated once the primary VM migration is complete.

The above approach does not require you to re-IR your VMs, ensures zero downtime for your production VMs and gives you the flexibility to stagger the upgrade process on your replica and primary servers.

↧

Hyper-V Replica: Extend Replication

December 9, 2013, 4:07 pm

≫ Next: Hyper-V Replica in Windows Server 2012 R2 and System Center Operations Manager 2012 R2

≪ Previous: Upgrading to Windows Server 2012 R2 with Hyper-V Replica

With Hyper-V Extend Replication feature in Windows Server 2012 R2, customers can have multiple copies of data to protect them from different outage scenarios. For example, as a customer I might choose to keep my second DR site in the same campus or a few miles away while I want to keep my third copy of data across the continents to give added protection for my workloads. Hyper-V Replica Extend replication exactly addresses this problem by providing one more copy of workload at an extended site apart from replica site. As mentioned in What’s new in Hyper-V Replica in Windows Server 2012 R2, user can extend the replication from Replica site and continue to protect the virtualized work loads even in case of disaster at primary site!!

This is so cool and exactly what I was looking for. But how do I enable this feature in Windows Server 2012 R2? Well, I will walk you through different ways in which you can enable replication and you will be amazed to see how similar is the experience is to enable replication wizard.

Extend Replication through UI:

Before you Extend Replication to third site, you need to establish the replication between a primary server and replica server. Once that is done, go to replica site and from Hyper-V UI manager select the VM for which you want to extend the replication. Right click on VM and select “Replication->Extend Replication …”. This will open Extend Replication Wizard which is similar to Enable Replication Wizard. Few points to be taken care are:

1. In Configure Replication frequency screen , note that Extend Replication only supports 5 minute and 15 minute Replication frequency. Also note that replication frequency of extend replication should be at least equal to or greater than primary replication relationship.

2. In Configure Additional Recovery Points screen, you can mention the recovery points you need on the extended replica server. Please note that you cannot configure App-Consistent snapshot frequency in this wizard.

Click Finish and you are done!! Isn’t it very similar to Enable Replication Wizard???

If you are working with clusters, in replica site go to Failover Cluster manager UI and select the VM for which you want to extend replication from Roles tab in the UI. Right Click on VM and select “Replication->Extend Replication”. Configure the extended replica cluster/server in the same way as you did above.

Extend Replication using PowerShell:

You can use the same PowerShell cmdlet which you used for enabling Replication to create extended replication relationship. However as stated above, you can only choose a replication frequency of either 5 minutes or 15 minutes.

Enable-VMReplication –VMName <vmname> -ReplicaServerName <extended_server_name> -ReplicaServerPort <Auth_port> -AuthenticationType <Certificate/Kerberos> -ReplicationFrequencySec <300/900> [--other optional parameters if needed—]

Status and Health of Extended Replication:

Once you extend replication from replica site, you can check Replication tab in Replica Site Hyper-V UI and you will see details about extend replication being present along with Primary Relation ship.

You can also check-up Health Statistics of Extended Replication from Hyper-V UI. Go to VM in Replica site and right click and select “Replication->View replication Health” . Extended Replication health statistics are displayed under a separate tab named “Extended Replication”.

You can also query PowerShell on the replica site to see details about Extended Replication Relationship.

Measure-VMReplication –VMName <name> -ReplicationRelationshipType Extended | select *

This is all great. But how do I carry out failover in case of Extended Replication? I will reserve that to my next blog post. Until then happy extended Replication Smile

↧

Hyper-V Replica in Windows Server 2012 R2 and System Center Operations Manager 2012 R2

December 10, 2013, 5:35 am

≫ Next: Using data deduplication with Hyper-V Replica for storage savings

≪ Previous: Hyper-V Replica: Extend Replication

Continuing from my previous post Monitoring Hyper-V Replica using Systems Center Operations Manager, in this blog post I will walk through some of the things to be taken care of while using System Center Operations Manager 2012 R2 for monitoring Windows Server 2012 R2 hosts. If you haven’t read previous blog, I request you go through it before you start monitoring Windows Server 2012 R2 machines. The best part of this story is all the monitors present in the previous version of SCOM work with this new version of OS.

As mentioned in What is New in Windows Server 2012 R2, we have added ability to configure the replication frequency of VMs. User now can replicate at 30sec, 5 minute and 15 minutes. Correspondingly alerts we generate for “Hyper-V 2012 Replication Count Percent Monitor” change based on the VM. If you have VMs with varying replication frequency, coming up with a percentage number becomes tricky. For this we suggest, to set the count percent to a number which can catch missed number of cycles for your least replication frequency VM. For example, if I have VMs of replication frequency 30ec, 5 minutes and 15 minutes in my environment and if I want to get notified for even if I miss one replication cycle in an interval of one hour, this means a percentage number of 1/(2*60) for 30sec Replication frequency VMs; a percentage number of 1/12 for 5 Minute replication frequency VMs and a percentage number of 1/4 for 15 minute replication frequency VMs. By setting count percentage value to 1/(2*60), I can catch alerts from all VMs which missed one replication cycle in the interval period of 60 minutes.

Rest of the monitors just work as they work in Windows Server 2012. What is more, Hyper-V Extensions management pack written by Cristian Edwards Sabathe, now supports Windows Server 2012 R2. You can download the pack from here. In addition to the existing dash boards, it now supports Extended Replica VMs. Go try it out!!

↧

Using data deduplication with Hyper-V Replica for storage savings

December 23, 2013, 12:40 am

≫ Next: Measuring Replication Health in a cluster

≪ Previous: Hyper-V Replica in Windows Server 2012 R2 and System Center Operations Manager 2012 R2

Protection of data has always been a priority for customers, and disaster recovery allows the protection of data with better restore times and lower data loss at the time of failover. However, as with all protection strategies, additional storage is a cost that needs to be incurred. With storage usage growing exponentially, a strategy is needed to help enterprises control their spend on storage hardware. This is where data deduplication comes in. Deduplication itself has been around for years, but in this blog post we will talk about how users of Hyper-V Replica (HVR) can benefit from it. This blog post has been written collaboratively with the Windows Server Data Deduplication team.

Deduplication considerations

To begin with, it is important to acknowledge the workloads that are suitable for deduplication using Windows Server 2012 R2. There is an excellent TechNet article that covers this aspect and would be applicable in the case of Hyper-V Replica as well. It is important to remember that deduplication of running virtual machines is only officially supported starting with Windows Server 2012 R2 for Virtual Desktop Infrastructure (VDI) workloads with VHDs running on a remote file server. Generic VM (non-VDI) workloads may run on a deduplication enabled volume but the performance is not guaranteed. Windows Server 2012 deduplication is only supported for cold data (files not open).

Why use deduplication with Hyper-V Replica?

One of the most common deployment scenarios of VDI involves a golden image that is read-only. VDI virtual machines are built using diff-disks that have this golden image as the parent. The setup would look roughly like this:

This deployment saves a significant amount of storage space. However, when Hyper-V Replica is used to replicate these VMs, each diff-disk chain is treated as a single unit and is replicated. So on the replica site there will be 3 copies of the golden image as a part of the replication.

Data deduplication becomes a great way to reclaim that space used.

Deployment options

Data deduplication is applicable at a volume level, and the volume can be made available with either SMB 3.0, CSV FS, or NTFS. The deployments (at either the Primary or Replica site) would broadly look like these:

1. SMB 3.0

2. CSVFS

3. NTFS

Ensure that the VHD files that need to be deduplicated are placed in the right volume – and this can be done using authorization entries. Using HVR in conjunction with Windows Server Data Deduplication will require some additional planning to take into consideration possible performance impacts to HVR when running on a volume enabled for deduplication.

Deduplication on the Primary site

Enabling data deduplication on the primary site volumes will not have an impact on HVR. No additional configurations or changes need to be done to use Hyper-V Replica with deduplicated data volumes.

Deduplication on the Replica site

WITHOUT ADDITIONAL RECOVERY POINTS

Enabling data deduplication on the replica site volumes will not have an impact on HVR. No additional configurations or changes need to be done to use Hyper-V Replica with deduplicated data volumes.

WITH ADDITIONAL RECOVERY POINTS

Hyper-V Replica allows the user to have additional recovery points for replicated virtual machines that allows the user to go back in time during a failover. Creating the recovery points involves reading the existing data from the VHD before the log files are applied. When the Replica VM is stored on a deduplication-enabled volume, reading the VHD is slower and this impacts the time taken by the overall process. The apply time on a deduplication-enabled VHD can be between 5X and 7X more than without deduplication. When the time taken to apply the log exceeds the replication frequency then there will be a log file pileup on the replica server. Over a period of time this can lead to the health of the VM degrading. The other side effect is that the VM state will always be “Modifying” and in this state other Hyper-V operations and backup will not be possible.

There are two mitigation steps suggested:

Defragment the deduplication-enabled volume on a regular basis. This should be done at least once every 3 days, and preferably once a day.
Increase the frequency of deduplication optimization. For instance, set the deduplication policy to optimize data older than 1 day instead of the default 3 days. Increasing the deduplication frequency will allow the deduplication service on the recovery server to keep up better with the changes made by HVR. This can be configured via the deduplication settings in Server Manager –>File and Storage Services –> Volume –> Configure Data Deduplication, or via PowerShell:

Set-DedupVolume <volume> -MinimumFileAgeDays 1

Other resources:

http://blogs.technet.com/b/filecab/archive/2013/07/31/extending-data-deduplication-to-new-workloads-in-windows-server-2012-r2.aspx

http://blogs.technet.com/b/filecab/archive/2013/07/31/deploying-data-deduplication-for-vdi-storage-in-windows-server-2012-r2.aspx

↧

Measuring Replication Health in a cluster

December 30, 2013, 10:51 pm

≫ Next: Network Recommendations for a Hyper-V Cluster in Windows Server 2012

≪ Previous: Using data deduplication with Hyper-V Replica for storage savings

As part of running a mini-scale run in my lab, I had to frequently monitor the replication health and also note down the replication statistics. The statistics is available by by right clicking on the VM (in the Hyper-V Manager or Failover Cluster Manager) and choosing the Replication submenu and clicking on the View Replication Health… option.

Clicking on the above option, displays the replication statistics which I am looking for.

Clicking on the ‘Reset Statistics’ clears the statistics collected so far and resets the start (“From time” field) time.

In a large deployment, it’s not practical to right click on each VM to get the health statistics. Hyper-V PowerShell cmdlets help in simplifying the task. I had two requirements:

Requirement #1: Get a report of the average size of the log files which were being sent during the VMs replication interval
Requirement #2: Snap all the VMs replication statistics to the same start time (“From time”) field and reset the statistics

Measure-VMReplication provides the replication statistics for each of the replicating VMs. As I am only interested in the average replication size, the following cmdlet provides the required information.

Measure-VMReplication | select VMName,AvgReplSize

Like most of the other PowerShell cmdlets Measure-VMReplication takes the computer name as an input. To get the replication stats for all the VMs in the cluster, I would need to enumerate the nodes of the cluster and pipe the output to this cmdlet. The Get-ClusterNode is used to get the nodes of the cluster.

$ClusterName = "<Name of your cluster>"Get-ClusterNode -Cluster $ClusterName

We can pipe the output of each node of the cluster and the replication health of the VMs present on that node

Get-ClusterNode -Cluster $ClusterName | foreach-object {Measure-VMReplication -ComputerName $_ | Select VMName, AvgReplSize, PrimaryServerName, CurrentReplicaServerName | ft}

Requirement #1 is met, now let’s look at requirement #2. To snap all the replicating VMs statistics to a common start time, I used the Reset-VMReplicationStatistics which takes the VMName as an input. However if Reset-VMReplicationStatistics is used on a non-replicating VM, the cmdlet errors out with the following error message:

Reset-VMReplicationStatistics : 'Reset-VMReplicationStatistics' is not applicable on virtual machine 'IOMeterBase'.The name of the virtual machine is IOMeterBase and its ID is c1922e67-7a8b-4f36-a868-5174e7b6821a.At line:1 char:1+ Reset-VMReplicationStatistics -vmname IOMeterBase+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~    + CategoryInfo          : InvalidOperation: (Microsoft.Hyper...l.VMReplication:VMReplication) [Reset-VMReplicationStatistics], VirtualizationOperationFailedException    + FullyQualifiedErrorId : InvalidOperation,Microsoft.HyperV.PowerShell.Commands.ResetVMReplicationStatisticsCommand

It’s a touch messy and to address the issue, we would need to isolate the replicating VMs in a given server. This can be done by querying only for those VMs whose ReplicationMode is set (to either Primary or Replica). The output of Get-VM is shown below

PS C:\> get-vm | select vmname, ReplicationMode | fl VMName          : Cluster22-TPCC3ReplicationMode : Primary VMName          : IOMeterBaseReplicationMode : None

Cluster22-TPCC3 is a replicating VM (Primary VM) while replication has not been enabled on IOMeterBase VM. Putting things together, to get all the replicating VMs in the cluster use the Get-VM cmdlet and filter on ReplicationMode (Primary or Replica. You could also use the not-equal to operation get both primary and replica VMs)

Get-ClusterNode -Cluster $ClusterName | ForEach-Object {Get-VM -ComputerName $_ | Where-Object {$_.ReplicationMode -eq "Primary"}}

To reset the statistics, pipe the above cmdlet to Reset-VMReplicationStatistics

PS C:\> Get-ClusterNode -Cluster $ClusterName | ForEach-Object {Get-VM -ComputerName $_ | Where-Object {$_.ReplicationMode -eq "Primary"} | Reset-VMReplicationStatistics}

Wasn’t that a lot easier than right clicking on each VM in your cluster and clicking on the ‘Reset Statistics’ button? :)

↧

Network Recommendations for a Hyper-V Cluster in Windows Server 2012

January 19, 2014, 6:23 am

≫ Next: Update: Capacity Planner for Hyper-V Replica

≪ Previous: Measuring Replication Health in a cluster

We recently published a TechNet document http://technet.microsoft.com/library/dn550728.aspx which provides guidance on configuring your network for a Hyper-V Cluster in Windows Server 2012.

A snip of the summary from the document:

Windows Server 2012 supports the concept of converged networking, where different types of network traffic share the same Ethernet network infrastructure. In previous versions of Windows Server, the typical recommendation for a failover cluster was to dedicate separate physical network adapters to different traffic types. Improvements in Windows Server 2012, such as Hyper-V QoS and the ability to add virtual network adapters to the management operating system enable you to consolidate the network traffic on fewer physical adapters. Combined with traffic isolation methods such as VLANs, you can isolate and control the network traffic.

There are some major improvements from the Windows Server 2008 R2 guidance and there is a lot of emphasis on converged networking. The document also provides a practical example which isolates different kinds of traffic and assigns bandwidth ‘weight’.

↧

Update: Capacity Planner for Hyper-V Replica

January 21, 2014, 8:54 am

≫ Next: Hyper-V Replica debugging: Why are very large log files generated?

≪ Previous: Network Recommendations for a Hyper-V Cluster in Windows Server 2012

In May 2013, we released the first version of the Capacity Planner for Hyper-V Replica on Windows Server 2012. It allowed administrators to plan their Hyper-V Replica deployments based on the workload, storage, network, and server characteristics. While it is always possible to monitor every single perfmon counter to make an informed decision, a readymade tool always makes life simpler and easier.

The big plus comes from the fact that the guidance is based on actual workload and server characteristics, which makes it a level better than static input-based planning models. The tool picks the right counters to monitor, automates the metrics collection process, and generates an easily consumable report.

The tool and documentation have been updated for Windows Server 2012 R2 and can be download from here: http://www.microsoft.com/en-us/download/details.aspx?id=39057

What’s new

We received feedback from our customers on how the tool can be made better, and we threw in a few improvements of our own. Here is what the updated Capacity Planner tool has:

Support for Windows Server 2012 and Windows Server 2012 R2 in a single tool
Support for Extended Replication
Support for virtual disks placed on NTFS, CSVFS, and SMB shares
Monitoring of multiple standalone hosts simultaneously
Improved performance and scale – up to 100 VMs in parallel
Replica site input is optional – for those still in the planning stage of a DR strategy
Report improvements – e.g.: reporting the peak utilization of resources also
Improved guidance in documentation
Improved workflow and user experience

In addition, the documentation has a section on how the tool can be used for capacity planning of Hyper-V Recovery Manager based on the ‘cloud’ construct of System Center Virtual Machine Manager.

So go ahead, use the tool in your virtual infrastructure and share your feedback and questions through this blog post. We would love to hear your comments!

28-Feb-2014 update: Keith Mayer has an excellent guided hands-on lab demo that can be found here.

↧

Hyper-V Replica debugging: Why are very large log files generated?

February 2, 2014, 11:00 pm

≫ Next: Error 0x80090303 when enabling replication

≪ Previous: Update: Capacity Planner for Hyper-V Replica

Quite a few customers have reached out to us with this question, and you can even see a few posts around this on the TechNet Forums. The query comes in various forms:

“My log file size was in the MBs and sometime at night it went into the GBs – what happened?”
“I have huge amounts of data to sync across once a day when no data is being changed in the guest”
“The size of the log file (the .hrl) is growing 10X…”

The problem here is not just the exponential increase in the .hrl file size, but also the fact that the network impact of this churn was not accounted for during the planning stages of the datacenter fabric. Thus there isn't adequate network between the primary and the replica to transfer the huge log files being generated.

As a first step, the question that customers want answered is: What is causing this churn inside the guest?

Step 1: Isolate the high-churning processes

Download the script from here: http://gallery.technet.microsoft.com/Hyper-V-Replica-Identify-f09763b6, and copy the script into the virtual machine. The script collects information about the writes done by various processes and writes log files with this data.

I started the debugging process using the script on SQL Server virtual machine of my own. I copied the script into the VM and ran it in an elevated PowerShell window. You might run into PowerShell script execution policy restrictions, and you might need to set the execution policy to Unrestricted (http://technet.microsoft.com/en-us/library/ee176961.aspx).

At the same time, I was monitoring the VM using Perfmon from the host and checking to see if there is any burst of disk activity seen. The blue line in the Perfmon graph is something I was not expecting to see, and it is significantly higher than the rest of the data – the scale for the blue line is 10X that of the red and green lines. (Side note: I was also monitoring the writes from within the guest using Perfmon… to see if there was any mismatch. As you can see from the screenshot below, the two performance monitors are rather in sync :))

At this point, I have no clue what in the guest is causing this sort of churn to show up. Fortunately I have the script collecting data inside the guest that I will use for further analysis.

Pull out the two files from the guest VM for analysis in Excel – ProcStats-2.csv and HVRStats-2.csv. Before starting the analysis, one additional bit of Excel manipulation that I added was to include a column called Hour-Minute:it pulls out only the hour and minute from the timestamp (ignoring the seconds) and is used in the PivotTable analysis as a field. I use the following formula in the cell: =TIME(HOUR(A2), MINUTE(A2), 0) where A2 is the timestamp cell for that row. Copy it down and it’ll adjust the formula appropriately.

Overall write statistics (HVR Stats)

Let’s first look at the file HVRStats-2.csv in Excel. Use the data to create a PivotTable and a PivotChart – this gives a summarized view of the writes happening. What we see is that there is excessive data that gets written at 4:57 AM and 4:58 AM. This is more than 30X of the data written otherwise.

Per process write statistics

Now let’s look at ProcStats-2.csv in Excel. Use the data to create a PivotTable and PivotChart – and this should give us a per-process view of what is happening. With the per-process information, we can easily plot the data written by each process and identify the culprit. In this case, SQL Server itself caused a spike in the data written (highlighted in red)

This is what the graph looks like for a large data copy operation (~1.5 GB). There is a burst of writes between 1:52PM and 1:53PM in Explorer.exe – and this corresponds to the copy operation that was initiated.

What next?

At this point, you should be able to differentiate between the following process classes using the process name and PID:

Primary guest workload (eg: SQL Server)
Windows inbox processes (eg: page file, file copy, defragment, search indexer…)
Other/3rd party processes (eg: backup agent, anti-virus…)

Step 2: Which files are being modified?

Isolating the file sometimes helps in identifying the underlying operation. Once you know which process is causing the churn and at approximately what time, we can use the inbox tool Resource Monitor (resmon.exe) to track the Disk Activity. We can filter to show the details of the processes that we want in the Resource Monitor.

From the previous step you will get the details of the process causing the churn – for example, System (PID 4). Using the Resource Monitor you would find that the file being modified – for example, the file is identified as C:\pagefile.sys. This would lead you to the conclusion that it is the pagefile that is being churned.

Alternative tools:

Process Monitor: http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx
Windows Performance Recorder and Windows Performance Analyzer:
- http://msdn.microsoft.com/en-us/library/windows/hardware/hh448205.aspx
- http://msdn.microsoft.com/en-us/library/windows/hardware/hh448170.aspx

↧

Error 0x80090303 when enabling replication

February 6, 2014, 4:00 pm

≫ Next: Hyper-V Replica & Proxy Servers on primary site

≪ Previous: Hyper-V Replica debugging: Why are very large log files generated?

When trying to enable replication on one of my VMs in my lab setup, I encountered the following error – Hyper-V failed to authenticate the Replica server <server name> using Kerberos authentication. Error: The specified target is unknown or unreachable (0x80090303).

Needless to say, I was able to reach the replica server (prb2.hvrlab.com in my case), firewall settings in the replica server looked ok and I was able to TS and login to the replica server as well. As the error message indicated that the failure was encountered when authenticating the replica server, I decided to check the event viewer logs on the replica server. A couple of errors caught my eye:

(1) SPN registration failures

(2) This was followed by an error message which indicated that the authentication had failed

I was getting somewhere, so I ran the “setspn –l” command to list down the currently registered SPNs for the computer and the Hyper-V Replica entry was conspicuously absent.

I restarted the vmms service and when I re-ran the command, I could see the following (set of correct) entries

I have seen the SPN registration (b.t.w the following TechNet wiki gives more info on SPN registration http://social.technet.microsoft.com/wiki/contents/articles/1340.hyper-v-troubleshooting-event-id-14050-vmms.aspx) failures due to intermittent network blips. There are retry semantics to ensure that the SPN registration succeeds but there could be corner cases (like my messed up lab setup) where a manual intervention may be required to make quicker progress. I also stumbled upon a SPN wiki article: http://social.technet.microsoft.com/wiki/contents/articles/717.service-principal-names-spns-setspn-syntax-setspn-exe.aspx which gives more info on how to manually register the SPN. I didn’t require the info today, but it’s a good read nevertheless.

After fixing the replica server, the enable replication call went through as expected. Back to work…

↧

Hyper-V Replica & Proxy Servers on primary site

February 8, 2014, 4:00 pm

≫ Next: Hyper-V Replica Certificate based authentication and Proxy servers

≪ Previous: Error 0x80090303 when enabling replication

I was tinkering around with my lab setup which consists of a domain, proxy server, primary and replica servers. There are some gotchas when it comes to Hyper-V Replica and proxy servers and I realized that we did not have any posts around this. So here goes.

If the primary server is behind a proxy server (forward proxy) and if Kerberos based authentication is used to establish a connection between the primary and replica server, you might encounter an error: Hyper-V cannot connect to the specified Replica server <servername> due to connection timed out. Verify if a network connection exists to the Replica server or if the proxy settings have been configured appropriately to allow replication traffic.

I have a Forefront TMG 2010 acting as a proxy server and the logs in the proxy server

I also had netmon running in my primary server and the logs didn’t indicate too much other than for the fact that the connection never made it to the replica server – something happened between the primary and replica server which caused the connection to be terminated. The primary server name in this deployment is prb8.hvrlab.com and the proxy server is w2k8r2proxy1.hvrlab.com.

If a successful connection goes through, you will see a spew of messages on netmon

When I had observed the issue the first time when building the product, I had reached out to the Forefront folks @ Microsoft to understand this behavior. I came to understand that the Forefront TMG proxy server terminates any outbound (or upload) connections whose content length (request header) is > 4GB.

Hyper-V Replica set a high content length as we expect to transfer large files (VHDs) and it would save us the effort to re-establish the connection each time. A closer inspection of a POST request shows the content length which is being set by Hyper-V Replica (ahem, ~500GB)

The proxy server returns a what-uh? response in the form of a bad-request

That isn’t superhelpful by any means and the error message unfortunately isn’t too specific either. But now you know the reason for the failure – the proxy server terminates the connection the connection request and it never reaches the replica server.

So how do we work around it – there are two ways (1) Bypass the proxy server (2) Use cert based authentication (another blog for some other day).

The ability to by pass the proxy server is provided only in PowerShell in the ByPassProxyServer parameter of the Enable-VMReplication cmdlet - http://technet.microsoft.com/en-us/library/jj136049.aspx. When the flag is enabled, the request (for lack of better word) bypasses the proxy server. Eg:

Enable-VMReplication -vmname NewVM5 -AuthenticationType Kerberos -ReplicaServerName prb2 -ReplicaServerPort 25000 -BypassProxyServer $true Start-VMInitialReplication -vmname NewVM5

This is not available in the Hyper-V Manager or Failover Cluster Manager UI. It’s supported only in PowerShell (and WMI). Running the above cmdlets will create the replication request and start the initial replication.

↧

Hyper-V Replica Certificate based authentication and Proxy servers

February 17, 2014, 5:45 am

≫ Next: Backup of a Replica VM

≪ Previous: Hyper-V Replica & Proxy Servers on primary site

Continuing from where we left off, I have a small lab deployment which consists of a AD, DNS, Proxy server (Forefront TMG 2010 on WS 2008 R2 SP1), primary servers and replica servers. When the primary server is behind the proxy (forward proxy) and when I tried to enable replication using certificate based authentication, I got the following error message: The handle is in the wrong state for the requested operation (0x00002EF3)

That didn’t convey too much, did it? Fortunately I had netmon running in the background and the only set of network traffic which was seen was between the primary server and the proxy. A particular HTTP response caught my eye:

The highlighted text indicated that the proxy was terminating the connection and returning a ‘Bad gateway’ error. Closer look at the TMG error log indicated that the error was encountered during https-inspect state.

After some bing’ing of the errors and the pieces began to emerge. When HTTPS inspection is enabled, the TMG server terminates the connection and establishes a new connection (in our case to the replica server) acting as a trusted man-in-the-middle. This doesn’t work for Hyper-V Replica as we mutually authenticate the primary and replica server endpoints. To work around the situation, I disabled HTTPS inspection in the proxy server

and things worked as expected. The primary server was able to establish the connection and replication was on track.

↧

Backup of a Replica VM

April 24, 2014, 4:21 am

≫ Next: Optimizing Hyper-V Replica HTTPS traffic using Riverbed SteelHead

≪ Previous: Hyper-V Replica Certificate based authentication and Proxy servers

This blog post covers the scenarios and motivations that drive the backup of a Replica VM, and product guidance to administrators.

Why backup a Replica VM?

Ever since the advent of Hyper-V Replica in Windows Server 2012, customers have been interested in backing up the Replica VM. Traditionally, IT administrators have taken backups of the VM that contains the running workload (the primary VM) and backup products have been built to cater to this need. So when a significant proportion of customers talked about the backup of Replica VMs, we were intrigued. There are a few key scenarios where backup of a Replica VM becomes useful:

Reduce the impact of backup on the running workload: Taking the backup of a VM involves the creation of a snapshot/diff-disk to baseline the changes that need to be backed up. For the duration of the backup job, the workload is running on a diff-disk and there is an impact on the system when that happens. By offloading the backup to the Replica site, the running workload is no longer impacted by the backup operation. Of course, this is applicable only to deployments where the backup copy is stored on the remote site. For example, the daily backup operation might store the data locally for quicker restore times, but monthly or quarterly backup for long-term retention that are stored remotely can be done from the Replica VM.
Limited bandwidth between sites: This is typical of Branch Office-Head Office (BO-HO) kind of deployments where there are multiple smaller remote branch office sites and a larger central Head Office site. The backup data for the branch offices is stored in the head office, and an appropriate amount of bandwidth is provisioned by administrators to transfer the backup data between the two sites. The introduction of disaster recovery using Hyper-V Replica creates another stream of network traffic, and administrators have to re-evaluate their network infrastructure. In most cases, administrators either could not or were not willing to increase the bandwidth between sites to accommodate both backup and DR traffic. However they did come to the realization that backup and DR were independently sending copies of the same data over the network – and this was an area that could be optimized. With Hyper-V Replica creating a VM in the Head Office site, administrators could save on the network transfer by backing up the Replica VM locally rather than backing up the primary VM and sending the data over the network.
Backup of all VMs in the Hoster datacenter: Some customers use the Hoster datacenter as the Replica site, with the intention of not building a secondary datacenter of their own. Hosters have SLAs around the protection of all customer VMs in their datacenters – typically once a day backup. Thus the backup of Replica VMs becomes a requirement for the success of their business.

Thus various customer segments found that the backup of a Replica VM has value for their specific scenarios.

Data consistency

A key aspect of the backup operation is related to the consistency of the backed-up data. Customers have a clear prioritization and preference when it comes to data consistency of backed up VMs:

Application-consistent backup
Crash-consistent backup

And this prioritization applied to Replica VMs as well. Conversations with customers indicated that they were comfortable with crash-consistency for a Replica VM, if application-consistency was not possible. Of course, anything less than crash-consistency was not acceptable and customers preferred that backups fail rather than have inconsistent data getting backed up.

Attempting application-consistency

Typical backup products try to ensure application-consistency of the data being backed up (using the VSS framework) – and this works out well when the VM is running. However, the Replica VM is always turned off until a failover is initiated, and VSS is unable to guarantee application-consistent backup for a Replica VM. Thus getting application-consistent backup of a Replica VM is not possible.

Guaranteeing crash-consistency

In order to ensure that customers backing up Replica VMs always get crash-consistent data, a set of changes were introduced in Windows Server 2012 R2 that failed the backup operation if consistency could not be guaranteed. The virtual disk could be inconsistent when any one of the below conditions are encountered, and in these cases backup is expected to fail.

HRL logs are being applied to the Replica VM
Previous HRL log apply operation was cancelled or interrupted
Previous HRL log apply operation failed
Replica VM health is Critical
VM is in the Resynchronization Required state or the Resynchronization in progress state
Migration of Replica VM is in progress
Initial replication is in progress (between the primary site and secondary site)
Failover is in progress

Dealing with failures

These are largely treated as transient error states and the backup product is expected to retry the backup operation based on its own retry policies. With 30 second replication and apply being supported in Windows Server 2012 R2, the backup operation is expected to collide with HRL log apply more frequently – resulting in error scenario 1 mentioned above. A robust retry mechanism is needed to ensure a high backup success rate. In case the backup product is unable to retry or cope with failures then an option is to explicitly pause the replication before the backup is scheduled to run.

Key Takeaways

Impact on administrators

Backup of Replica VMs is better with Windows Server 2012 R2.
Only crash-consistent backup of a Replica VM is guaranteed.
A robust retry mechanism needs to be configured in the backup product to deal with failures. Or ensure that replication is paused when backup is scheduled.

Impact on backup vendors

The changes introduced in Windows Server 2012 R2 would benefit customers using any backup product to take backup of Replica VMs.
A robust retry mechanism would need to be built to deal with Replica VM failure.
For specific details on how Data Protection Manager (DPM) deals with the backup of Replica VMs, refer to this blog post.

Update 25-Apr-2014: The DPM-specific details on this post have been moved to the DPM blog.

↧

Optimizing Hyper-V Replica HTTPS traffic using Riverbed SteelHead

May 8, 2014, 10:54 am

≫ Next: TechEd North America 2014

≪ Previous: Backup of a Replica VM

Hyper-V Replica support both Kerberos based authentication and certificate based authentication – the former sends the replication traffic between the two servers/sites over HTTP while the latter sends it over HTTPS. Network is a precious commodity and any optimization delivered has a huge impact on the organization’s TCO and the Recovery Point Objective (RPO).

Around a year back, we partnered with the folks from Riverbed in Microsoft’s EEC lab, to publish a whitepaper which detailed the bandwidth optimization of replication traffic sent over HTTP.

A few months back, we decided to revisit the setup with the latest release of RiOS (Riverbed OS which runs in the Riverbed appliance). Using the resources and appliances from EEC and Riverbed, a set of experiments were performed to study the network optimizations delivered by the Riverbed SteelHead appliance. Optimizing SSL traffic has been a tough nut to crack and we saw some really impressive numbers. The whitepaper documenting the results and technology is available here - http://www.microsoft.com/en-us/download/details.aspx?id=42627.

At a high level, in order to optimize HTTPS traffic, the Riverbed SteelHead appliance decrypts the packet from the client (the primary server). It then optimizes the payload and encrypts the payload before sending it to the server side SteelHead appliance over the internet/WAN. The server-side SteelHead appliance decrypts the payload, de-optimizes the traffic and re-encrypts it. The server side appliance finally sends it to the destination server (the replica server) which proceeds to decrypt the replication traffic. The diagram is taken from Riverbed’s user manual and explains the above technology:

When Hyper-V Replica’s inbuilt compression is disabled, the reduction delivered over WAN was ~80%

When Hyper-V Replica’s inbuilt compression is enabled, the reduction delivered over WAN was ~30%

It’s worth calling out that the % reduction delivered depends on a number of factors such as workload read, write pattern, sparseness of the disk etc but the numbers were quite impressive.

In summary, both Hyper-V Replica and the SteelHead devices were easy to configure and worked “out-of the box”. Neither product required specific configurations to light up the scenario. The Riverbed appliance delivered ~30% on compressed, encrypted Hyper-V Replica traffic and ~80% on uncompressed, encrypted Hyper-V Replica traffic.

↧

TechEd North America 2014

May 8, 2014, 10:02 pm

≫ Next: Excluding virtual disks in Hyper-V Replica

≪ Previous: Optimizing Hyper-V Replica HTTPS traffic using Riverbed SteelHead

This year TechEd North America is happening between 12 May and 15 May, at Houston. There are some interesting sessions around Backup and Disaster Recovery – so I would highly encourage you all to attend these sessions and interact with the folks presenting.

Sessions on May 12

Sessions on May 15

Looking forward to seeing you all there!

↧

Excluding virtual disks in Hyper-V Replica

May 11, 2014, 4:00 am

≫ Next: Application consistent recovery points with Windows Server 2008/2003 guest OS

≪ Previous: TechEd North America 2014

Since its introduction in Windows Server 2012, Hyper-V Replica has provided a way for users to exclude specific virtual disks from being replicated. This option is rarely exercised but can have a significant benefits when used correctly. This blog post covers the disk exclusion scenarios and the impact this has on the various operations done during the lifecycle of VM replication. This blog post has been co-authored by Priyank Gaharwar of the Hyper-V Replica test team.

Why exclude disks?

Excluding disks from replication is done because:

The data churned on the excluded disk is not important or doesn’t need to be replicated (and)
Storage and network resources can be saved by not replicating this churn

Point #1 is worth elaborating on a little. What data isn't “important”? The lens used to judge the importance of replicated data is its usefulness at the time of Failover. Data that is not replicated should also not be needed at the time of failover. Lack of this data would then also not impact the Recovery Point Objective (RPO) in any material way.

There are some specific examples of data churn that can be easily identified and are great candidates for exclusion – for example, page file writes. Depending on the workload and the storage subsystem, the page file can register a significant amount churn. However, replicating this data from the primary site to the replica site would be resource intensive and yet completely worthless. Thus the replication of a VM with a single virtual disk having both the OS and the page file can be optimized by:

Splitting the single virtual disk into two virtual disks – one with the OS, and one with the page file
Excluding the page file disk from replication

How to exclude disks

Application impact - isolating the churn to a separate disk

The first step in using this feature is to first isolate the superfluous churn on to a separate virtual disk, similar to what is described above for page files. This is a change to the virtual machine and to the guest. Depending on how your VM is configured and what kind of disk you are adding (IDE, SCSI) you may have to power off your VM before any changes can be made.

At the end, an additional disk should surface up in the guest. Appropriate configuration changes should be done in the application to change the location of the temporary files to point to the newly added disk.

Figure 1: Changing the location of the System Page File to another disk/volume

Excluding disks in the Hyper-V Replica UI

Right-click on a VM and select “Enable Replication…”. This will bring up the wizard that walks you through the various inputs required to enable replication on the VM. The screen titled “Choose Replication VHDs” is where you deselect the virtual disks that you do not want to replicate. By default, all virtual disks will be selected for replication.

Figure 2: Excluding the page file virtual disk from a virtual machine

Excluding disks using PowerShell

The Enable-VMReplication commandlet provides two optional parameters: –ExcludedVhd and–ExcludedVhdPath. These parameters should be used to exclude the virtual disks at the time of enabling replication.

PS C:\Windows\system32> Enable-VMReplication -VMName SQLSERVER -ReplicaServerName repserv01.contoso.com -AuthenticationType Kerberos -ReplicaServerPort 80 -ExcludedVhdPath 'D:\Primary-Site\Hyper-V\Virtual Hard Disks\SQL-PageFile.vhdx'

After running this command, you will be able to see the excluded disks under VM Settings> Replication> ReplicationVHDs.

Figure 3: List of disks included for and excluded from replication

Impact of disk exclusion

Enable replication	A placeholder disk (for use during initial replication) is not created on the Replica VM. The excluded disk doesn’t exist on the replica in any form.
Initial replication	The data from the excluded disks are not transferred to the replica site.
Delta replication	The churn on any of the excluded disks is not transferred to the replica site.
Failover	The failover is initiated without the disk that has been excluded. Applications that refer to the disk/volume in the guest will have their configurations incorrect. For page files specifically, if the page file disk is not attached to the VM before VM boot up then the page file location is automatically shifted to the OS disk.
Resynchronization	The excluded disk is not part of the resynchronization process.

Ensuring a successful failover

Most applications have configurable settings that make use of file system paths. In order to run correctly, the application expects these paths to be present. The key to a successful failover and an error-free application startup is to ensure that the configured paths are present where they should be. In the case of file system paths associated with the excluded disk, this means updating the Replica VM by adding a disk - along with any subfolders that need to be present for the application to work correctly.

The prerequisites for doing this correctly are:

The disk should be added to the Replica VM before the VM is started. This can be done at any time after initial replication completes, but is preferably done immediately after the VM has failed over.
The disk should be added to the Replica VM with the exact controller type, controller number, and controller location as the disk has on the primary.

There are two ways of making a virtual disk available for use at the time of failover:

Copy the excluded disk manually (once) from the primary site to the replica site
Create a new disk, and format it appropriately (with any folders if required)

When possible, option #2 is preferred over option #1 because of the resources saved from not having to copy the disk. The following PowerShell script can be used to green-light option #2, focusing on meeting the prerequisites to ensure that the Replica VM is exactly the same as the primary VM from a virtual disk perspective:

param (    [string]$VMNAME,    [string]$PRIMARYSERVER) ## Get VHD details from primary, replica$excludedDisks = Get-VMReplication -VMName $VMNAME -ComputerName $PRIMARYSERVER | select ExcludedDisks$includedDisks = Get-VMReplication -VMName $VMNAME | select ReplicatedDisksif( $excludedDisks -eq $null ) {exit} #Get location of first replica VM disk$replicaPath = $includedDisks.ReplicatedDisks[0].Path | Split-Path -Parent ## Create and attach each excluded diskforeach( $exDisk in $excludedDisks.ExcludedDisks ){#Get the actual disk object    $pDisk = Get-VHD -Path $exDisk.Path -ComputerName $PRIMARYSERVER    $pDisk#Create a new VHD on the Replica    $diskpath = $replicaPath + "\" + ($pDisk.Path | Split-Path -Leaf)    $newvhd = New-VHD -Path $diskpath `                      -SizeBytes $pDisk.Size `                      -Dynamic `                      -LogicalSectorSizeBytes $pDisk.LogicalSectorSize `                      -PhysicalSectorSizeBytes $pDisk.PhysicalSectorSize `                      -BlockSizeBytes $pDisk.BlockSize `                      -Verbose    if($newvhd -eq $null)     {        Write-Host "It is assumed that the VHD [" ($pDisk.Path | Split-Path -Leaf) "] already exists and has been added to the Replica VM [" $VMNAME "]"continue;    } #Mount and format the new new VHD    $newvhd | Mount-VHD -PassThru -verbose `            | Initialize-Disk -Passthru -verbose `            | New-Partition -AssignDriveLetter -UseMaximumSize -Verbose `            | Format-Volume -FileSystem NTFS -Confirm:$false -Force -verbose `#Unmount the disk     $newvhd | Dismount-VHD -Passthru -Verbose #Attach disk to Replica VM    Add-VMHardDiskDrive -VMName $VMNAME `                        -ControllerType $exDisk.ControllerType `                        -ControllerNumber $exDisk.ControllerNumber `                        -ControllerLocation $exDisk.ControllerLocation `                        -Path $newvhd.Path `                        -Verbose}

The script can also be customized for use with Azure Hyper-V Recovery Manager, but we’ll save that for another post!

Capacity Planner and disk exclusion

The Capacity Planner for Hyper-V Replica allows you to forecast your resource needs. It allows you to be more precise about the replication inputs that impact the resource consumption – such as the disks that will be replicated and the disks that will not be replicated.

Figure 4: Disks excluded for capacity planning

Key Takeaways

Excluding virtual disks from replication can save on storage, IOPS, and network resources used during replication
At the time of failover, ensure that the excluded virtual disk is attached to the Replica VM
In most cases, the excluded virtual disk can be recreated on the Replica side using the PowerShell script provided

↧

Application consistent recovery points with Windows Server 2008/2003 guest OS

May 19, 2014, 10:00 pm

≫ Next: Disaster Recovery to Microsoft Azure – Part 1

≪ Previous: Excluding virtual disks in Hyper-V Replica

I recently had a conversation with a customer around a very interesting problem, and the insights that were gained there are worth sharing. The issue was about VSS errors popping up in the guest event viewer while Hyper-V Replica reported the successful creation of application-consistent (VSS-based) recovery points.

Deployment details

The customer had the following setup that was throwing errors:

Primary site: Hyper-V Cluster with Windows Server 2012 R2
Replica site: Hyper-V Cluster with Windows Server 2012 R2
Virtual machines: SQL server instances with SQL Server 2012 SP1, SQL Server 2005, and SQL Server 2008

At the time of enabling replication, the customer selected the option to create additional recovery points and have the “Volume Shadow Copy Service (VSS) snapshot frequency” as 1 hour. This means that every hour the VSS writer of the guest OS would be invoked to take an application-consistent snapshot.

Symptoms

With this configuration, there was a contradiction in the output – the guest event viewer showed errors/failure during the VSS process, while the Replica VM showed application-consistent points in the recovery history.

Here is an example of the error registered in the guest:

SQLVM: Loc=SignalAbort. Desc=Client initiates abort. ErrorCode=(0). Process=2644. Thread=7212. Client. Instance=. VD=Global\******* BACKUP failed to complete the command BACKUP DATABASE model. Check the backup application log for detailed messages. BackupVirtualDeviceFile::SendFileInfoBegin:  failure on backup device '{********-63**-49**-BA**-5DB6********}1'. Operating system error 995(error not found).

Root cause and Dealing with the errors

The big question was: Why was Hyper-V Replica showing application-consistent recovery points if there are failures?

The behavior seen by the customer is a benign error caused because of the interaction between Hyper-V and VSS, especially for older versions of the guest OS. Details about this can be found in the KB article here: http://support.microsoft.com/kb/2952783

The Hyper-V requestor explicitly stops the VSS operation right after the OnThaw phase. While this ensures application-consistency of the writes going to the disk, it also results in the VSS errors being logged. Meanwhile, Hyper-V returns the consistency correctly to Hyper-V Replica, which in turn makes sure that the recovery side shows application-consistent points.

A great way to validate whether the recovery point is application-consistent or not is to do a test failover on that recovery point. After the VM has booted up, the event viewer logs will have events pertaining to a rollback - and this would mean that the point is not application consistent.

Key Takeaways

All in all, you can rest assured that in the case of VMs with older operating systems, Hyper-V Replica is correctly taking an application-consistent snapshot of the virtual machine.
Although there are errors seen in the guest, they are benign and having a recovery history with application-consistent points is an expected behavior.

↧

Disaster Recovery to Microsoft Azure – Part 1

June 20, 2014, 4:34 am

≫ Next: Azure Site Recovery - FAQ

≪ Previous: Application consistent recovery points with Windows Server 2008/2003 guest OS

Drum roll please!

We are super excited to announce the availability of the preview bits of Azure Site Recovery (ASR) which enables you to replicate Hyper-V VMs to Microsoft Azure for business continuity and disaster recovery purposes.

You can now protect, replicate, and failover VMs directly to Microsoft Azure – our guarantee remains that whether you enable Disaster Recovery across On-Premise Enterprise Private Clouds or directly to Azure, your virtualized workloads will be recovered accurately, consistently, withminimal downtime and with minimal data loss.

ASR supports Automated Protection and Replication of VMs, customizable Recovery Plans that enable One-Click Recovery, No-Impact Recovery Plan Testing (ensures that you meet your Audit and Compliance requirements), and best-in-class Security and Privacy features that offer maximum resilience to your business critical applications. All this with minimal cost and without the need to invest in a recovery datacenter. To know more about this announcement and what we have enabled in the Preview, check outBrad Anderson’s In the Cloud blog.

We will cover this feature in detail in the coming weeks – stay tuned and try out the feature. We love to hear your feedback!

↧