Azure Site Recovery - FAQ

June 25, 2014, 6:48 pm

≫ Next: Out-of-band Initial Replication (OOB IR) and Deduplication

≪ Previous: Disaster Recovery to Microsoft Azure – Part 1

Quick post to clarify some frequently asked questions on the newly announced Azure Site Recovery service which enables you to protect your Hyper-V VMs to Microsoft Azure. The FAQ will not address every feature capability - it should help you get started.

Q1: Did you just change the name from Hyper-V Recovery Manager to Azure Site Recovery?

A: Nope – we did more than that. Yes, we rebranded Hyper-V Recovery Manager to Azure Site Recovery (ASR) but we also brought in a bunch of new features. This includes the much awaited capability to replicate virtual machines (VMs) to Microsoft Azure. With this feature, ASR now orchestrates replication and recovery between private cloud to private cloud as well as private cloud to Azure.

Q2: What did you GA in Jan 2014?…

A: In Jan 2014, we announced the general availability of Hyper-V Recovery Manager (HRM) which enabled you to manage, orchestrate protection & recovery workflows of *your* private clouds. You (as a customer) owned both the primary and secondary datacenter which was managed by SCVMM. Built on top of Windows Server 2012/R2 Hyper-V Replica, we offered a cloud integrated Disaster Recovery Solution.

Q3: HRM was an Azure service but data was replicated between my datacenters? And this continues to work?

A: Yes on both counts. The service was being used to provide the “at-scale” protection & recovery of VMs.

Q4: What is in preview as of June 2014 (now)?

A: The rebranded service now has an added capability to protect VMs to Azure (=> Azure is your secondary datacenter). If your primary machine/server/VM is down due to a planned/unplanned event, you can recover the replicated VM in Azure. You can also bring back (or failback) your VM to your private cloud once it’s recovered from a disaster.

Q5: Wow, so I don’t need a secondary datacenter?

A: Exactly. You don’t need to invest and maintain a secondary DC. You can reap the benefits of Azure’s SLAs by protecting your VMs on Azure. The replica VM does *NOT* run in Azure till you initiate a failover.

Q6: Where is my data stored?

A: Your data is stored in *your* storage account on top of world class geo-redundant storage provided by Azure.

Q7: Do you encrypt my replica data?

A: Yes. You can also optionally encrypt the data. You own & manage the encryption key. Microsoft never requires them till you opt to failover the VM in Azure.

Q8: And my VM needs to be part of a SCVMM cloud?

A: Yes. For the current preview release, we need your VMs to be part of a SCVMM managed cloud. Check out the benefits of SCVMM @ http://technet.microsoft.com/en-us/library/dn246490.aspx

Q9: Can I protect any guest OS?

A: Your protection and recovery strategy is tied to Microsoft Azure’s supported operating systems. You can find more details in http://msdn.microsoft.com/en-us/library/azure/dn469078.aspx under the “Virtual Machines support – on premises to Azure” section.

Q10: Ok, but what about the host OS on-premises?

A: For the current preview release, the host OS should be Windows Server 2012 R2.

In summary, you can replicate any supported Windows and Linux SKU mentioned in Q9 running on top of a Windows Server 2012 R2 Hyper-V server.

Q11: Can I replicate Gen-2 VMs on Windows Server 2012 R2?

A: For the preview release, you can protect only Generation 1 VMs. Trying to protect a Gen-2 VM will fail with an appropriate error message.

Q12: Is the product guest-agnostic or should I upload any agent?

A: The on-premises technology is built on top of Windows Server 2012 Hyper-V Replica which is guest, workload and storage agnostic.

Q13: What about disks and disk geometries?

A: We support all combinations of VHD/x with fixed, dynamic, differencing.

Q14: Any restrictions on the size of the disks?

A: There are certain restrictions on the size of the disks of IaaS VMs on Azure – primary being:

The OS disk cannot be more than 127GB
Each data disk cannot be more than 1TB
There can be upto 16 disks attached
- Note: When failing over a VM, ensure that you pick the right size based on the disk parameter as well. Refer to http://msdn.microsoft.com/en-us/library/azure/dn197896.aspx

Azure is a rapidly evolving platform and these restrictions are applicable as of June 2014.

Q15: Any gotchas with network configuration or memory assigned to the VM?

A: Just like the previous question, when you failover your VM, you will be bound by Azure’s offerings/features of IaaS VMs. As of today, Azure supports one network adapter and upto 112GB (in the A9 VM). The product does not put a hard-block in case you have a different network and/or memory configuration on-premises. You can change the parameters with which a VM can be created in the Azure portal under the Recovery Services option.

Q16: Where can I find information about the product, pricing etc?

A: To know more about the Azure Site Recovery, pricing, documentation; visit http://azure.microsoft.com/en-us/services/site-recovery/

Q17: Is there any document explaining the workflows?

A: You can refer to the getting-started-guide @ http://azure.microsoft.com/en-us/documentation/articles/hyper-v-recovery-manager-azure/ or post a question in our forums (see below)

Q18: I faced some errors when using the product, is there any MSDN forum where I can post my query.

A: Yes, please post your questions, queries @ http://social.msdn.microsoft.com/Forums/windowsazure/en-US/home?forum=hypervrecovmgr

Q19: But I really feel strongly about some of the features and I would like to share my feedback with the PG. Can I comment on the blog?

A: We love to hear your feedback and feel free to leave your comments in any of our blog articles. But a more structured approach would be to post your suggestions @ http://feedback.azure.com/forums/256299-site-recovery

Q20: Will you build everything which I suggest?

A: Of course…not :) But on a serious note – we absolutely love to hear from you. So don’t be shy with your feedback.

↧

Out-of-band Initial Replication (OOB IR) and Deduplication

July 11, 2014, 9:50 am

≫ Next: Announcing the GA of Disaster Recovery to Azure using Azure Site Recovery

≪ Previous: Azure Site Recovery - FAQ

A recent conversation with a customer brought out the question: What is the best way to create an entire Replica site from scratch? At the surface this seems simple enough – configure initial replication to send the data over the network for the VMs one after another in sequence. For this specific customer however, there were some additional constraints placed:

The network bandwidth was less than 10Mbps and it primarily catered to their daily business needs (email etc…). Adding more network was not possible within their budget. This came as quite a surprise because despite the incredible download speeds that are encountered these days, there are still places in the world where it isn't as cost effective to purchase those speeds.
The VMs were of size between 150GB and 300GB each. This made it rather impractical to send the data over the wire. In the best case, it would have taken 34 hours for a single VM of size 150GB.

This left OOB IR as the only realistic way to transfer data. But at 300GB per VM, it is easy to exhaust a removable drive of 1TB. That left us thinking about deduplication – after all, deduplication is supported on the Replica site. So why not use it for deduplicating OOB IR data?

So I tested this out in my lab environment with a removable USB drive, and a bunch of VMs created out of the same Windows Server 2012 VHDX file. The expectation was that at least 20% to 40% of the data would be same in the VMs, and the overall deduplication rate would be quite high and we could fit a good number of VMs into the removable USB drive.

I started this experiment by attaching the removable drive to my server and attempted to enable deduplication on the associated volume in Server Manager.

Interesting discovery #1: Deduplication is not allowed on volumes on removable disks

Whoops! This seems like a fundamental block to our scenario – how do you build deduplicated OOB IR, if the deduplication is not supported on removable media? This limitation is officially documented here: http://technet.microsoft.com/en-us/library/hh831700.aspx, and says “Volumes that are candidates for deduplication must conform to the following requirements: Must be exposed to the operating system as non-removable drives. Remotely-mapped drives are not supported.”

Fortunately my colleague Paul Despe in the Windows Server Data Deduplication team came to the rescue. There is a (slightly) convoluted way to get the data on the removable drive and deduplicated. Here goes:

Create a dynamically expanding VHDX file. The size doesn’t matter too much as you can always start off with the default and expand if required.

Using Disk Management, bring the disk online, initialize it, create a single volume, and format it with NTFS. You should be able to see the new volume in your Explorer window. I used Y:\ as the drive letter.

Mount this VHDX on the server you are using to do the OOB IR process.
If you go to Server Manager and view this volume (Y:\), you will see that it is backed by a fixed disk.

In the volume view, enable deduplication on this volume by right-clicking and selecting ‘Configure Data Deduplication’. Set the ‘Deduplicate files older than (in days)’ field to zero.

You can also enable deduplication in PowerShell with the following commandlets:

PS C:\> Enable-DedupVolume Y: -UsageType HyperVPS C:\> Set-DedupVolume Y: -MinimumFileAgeDays 0

Now you are set to start the OOB IR process and take advantage of the deduplicated volume. This is what I saw after 1 VM was enabled for replication with OOB IR:

That’s about 32.6GB of storage used. Wait… shouldn’t there be a reduction in size because of deduplication?

Interesting discovery #2: Deduplication doesn’t work on-the-fly

Ah… so if you were expecting that the VHD data would arrive into the volume in deduplicated form, this is going to be a bit of a surprise. At the first go, the VHD data will be present in the volume in its original size. Deduplication happens as post-facto as a job that crunches the data and reduces the size of the VHD after it has been fully copied as a part of the OOB IR process. This is because deduplication needs an exclusive handle on the file in order to go about doing its work.

The good part is that you can trigger the job on-demand and start the deduplication as soon as the first VHD is copied. You can do that by using the PowerShell commandlet provided:

PS C:\> Start-DedupJob Y: -Type Optimization

There are other parameters provided by the commandlet that allow you to control the deduplication job. You can explore the various options in the TechNet documentation: http://technet.microsoft.com/en-us/library/hh848442.aspx.

This is what I got after the deduplication job completed:

That’s a 54% saving with just one VM – a very good start!

Deduplication rate with more virtual machines

After this I threw in a few more virtual machines with completely different applications installed and here is the observed savings after each step:

I think the excellent results speak for themselves! Smile Notice how between VM2 and VM3, almost all of the data (~9GB) has been absorbed by deduplication with an increase of only 300MB! As the deduplication team as published on TechNet, VDI VMs would have a high degree of similarity in their disks and would result in a much higher deduplication rate. A random mix of VMs yields surprisingly good results as well.

Final steps

Once you are done with the OOB IR and deduplication of your VMs, you need to do the following steps:

Ensure that no deduplication job is running on the volume
Eject the fixed disk – this should disconnect the VHD from the host
Compact the VHD using the “Edit Virtual Hard Disk Wizard”. At the time I disconnected the VHD from the host, the size of the VHD was 36.38GB. After compacting it the size came down to 28.13GB… and this is more in line with the actual disk consumed that you see in the graph above
Copy the VHD to the Replica site, mount it on the Replica host, and complete the OOB IR process!

Hope this blog post helps with setting up your own Hyper-V Replica sites from scratch using OOB IR! Try it out and let us know your feedback.

↧

Announcing the GA of Disaster Recovery to Azure using Azure Site Recovery

October 2, 2014, 12:48 pm

≫ Next: Announcing GA of Disaster Recovery to Azure - Purpose-Built for Branch Offices and SMB

≪ Previous: Out-of-band Initial Replication (OOB IR) and Deduplication

I am excited to announce the GA of the Disaster Recovery to Azure using Azure Site Recovery. In addition to enabling replication to and recovery in Microsoft Azure, ASR enables automated protection of VMs, remote health monitoring, no-impact recovery plan testing, and single click orchestrated recovery - all backed by an enterprise-grade SLA.

The DR to Azure functionality in ASR builds on top of System Center Virtual Machine Manager, Windows Server Hyper-V Replica, and Microsoft Azure to ensure that our customers can leverage existing IT investments while still helping them optimize precious CAPEX and OPEX spent in building and managing secondary datacenter sites.

The GA release also brings significant additions to the already expansive list of ASR’s DR to Azure features:

^NEWASR Recovery Plans and Azure Automation integrate to offer robust and simplified one-click orchestration of your DR plans
^NEWTrack Initial Replication Progress as virtual machine data gets replicated to a customer-owned and managed geo-redundant Azure Storage account. This new feature is also available when configuring DR between on-premises private clouds across enterprise sites
^NEWSimplified Setup and Registration streamlines the DR setup by removing the complexity of generating certificates and integrity keys needed to register your on-premises System Center Virtual Machine Manager server with your Site Recovery vault

↧

Announcing GA of Disaster Recovery to Azure - Purpose-Built for Branch Offices and SMB

December 11, 2014, 11:05 am

≫ Next: Replicate Azure Pack IaaS Workloads to Azure using ASR

≪ Previous: Announcing the GA of Disaster Recovery to Azure using Azure Site Recovery

Today, we are excited to announce the GA for Branch Office and SMB Disaster Recovery to Azure. Azure Site Recovery delivers a simpler, reliable & cost effective Disaster Recovery solution to Branch Office and SMB customers. ASR with new Hyper-V Virtual Machine Protection from Windows Server 2012 R2 to Microsoft Azure can now be used at customer owned sites and SCVMM is optional.

You can visit the Getting Started with Azure Site Recovery for additional information.

↧

Replicate Azure Pack IaaS Workloads to Azure using ASR

February 15, 2015, 10:58 pm

≫ Next: Discrete Device Assignment — GPUs

≪ Previous: Announcing GA of Disaster Recovery to Azure - Purpose-Built for Branch Offices and SMB

A few months back, we announced Azure Site Recovery (ASR)’s integration with Azure Pack, which enabled our Service Providers to start offering Managed Disaster Recovery for IaaS Workloads using ASR and Azure Pack. The response, since we announced the integration, has been phenomenal - customers are appreciating the simplicity of ASR and are adopting ASR as the standard for offering DR solution in their environments. The integration of ASR and Azure Pack also enabled Service Providers to offer DR as a value added service, opening up new review streams and the ability to offer better Service Level Agreements (SLA) to their end customers. A key ask that our early adopters have expressed – and that we are delivering on today – is the ability to use Microsoft Azure as a Disaster Recovery Site for IaaS workloads when using Azure Pack.

The new features in the ASR – Azure Pack integration will enable Service Providers to offer Azure as a DR site or to an on-premise secondary site using their Azure Pack UR 4.0 deployments. To enable these capabilities you can download the latest ASR runbooks and import them into your WAP environments. To know more about the integration, check out our WAP deployment guide.

Getting started with Azure Site Recovery is easy – simply check out the pricing information, and sign up for a free Microsoft Azure trial.

↧

Discrete Device Assignment — GPUs

November 23, 2015, 3:41 am

≫ Next: Discrete Device Assignment — Guests and Linux

≪ Previous: Replicate Azure Pack IaaS Workloads to Azure using ASR

This is the third post in a four part series. My previous two blog posts talked about Discrete Device Assignment (link) and the machines and devices necessary (link) to make it work in Windows Server 2016 TP4. This post goes into more detail, focusing on GPUs.

There are those of you out there who want to get the most out of Photoshop, or CATIA, or some other thing that just needs a graphics processor, or GPU. If that’s you, and if you have GPUs in your machine that aren’t needed by the Windows management OS, then you can dismount them and pass them through to a guest VM.

GPUs, though, are complicated beasts. People want them to run as fast as they possibly can, and to pump a lot more data through the computer’s memory than almost any other part of the computer. To manage this, GPUs run at the hairy edge of what PCI Express buses can deliver, and the device drivers for the GPUs often tune the GPU and sometimes even the underlying machine, attempting to ensure that you get a reasonable experience.

The catch is that, when you pass a GPU through to VM, the environment for the GPU changes a little bit. For one thing, the driver can’t see the rest of the machine, to respond to its configuration or to tune things up. Second, access to memory works a little differently when you turn on an I/O MMU, changing timings and such. So the GPU will tend to work if the machine’s BIOS has already set up the GPU optimally, and this limits the machines that are likely to work well with GPUs. Basically, these are servers which were built for hosting GPUs. They’ll be the sorts of things that the salesman wants to push on you when you use words like “desktop virtualization” and “rendering.” When I look at a server, I can tell whether it was designed for GPU work instantly, because it has lots of long (x16) PCI Express slots, really big power supplies and fans that make a spooky howling sound.

We’re working with the GPU vendors to see if they want to support specific GPUs, and they may decide to do that. It’s really their call, and they’re unlikely to make a support statement on more than the few GPUs that are sold into the server market. If they do, they’ll supply driver packages which convert them from being considered “use at your own risk” within Hyper-V to the supported category. When those driver packages are installed, the error and warning messages that appear when you try to dismount the GPU will disappear.

So, if you’re still reading and you want to play around with GPUs in your VMs, you need to know a few other things. First, GPUs can have a lot of memory. And by default, we don’t reserve enough space in our virtual machines for that memory. (We reserve it for RAM that you might add through Dynamic Memory instead, which is the right choice for most users.) You can find out how much memory space your GPU uses by looking at it in Device Manager, or through scripts by looking at the WMI Win32_PnPAllocatedResource class.

The screen shot above is from the machine I’m using to type this. You can see two memory ranges listed, with beginning and end values expressed in hexadecimal. Doing the conversion to more straightforward numbers, the first range (the video memory, mostly) is 256MB and the second one (video setup and control registers) is 128KB. So any VM you wanted to use this GPU with would need at least 257MB of free space within it.

In Hyper-V within Server 2016 TP4, there are two types of VMs, Generation 1 and Generation 2. Generation 1 is intended to run older 32-bit operating systems and 64-bit operating systems which depend on the VM having a structure very like a PC. Generation 2 is intended for 64-bit operating systems which don’t depend on a PC architecture.

A Generation 1 VM, because it is intended to run 32-bit code, attempts to reserve as much as possible in the VM for RAM in the 32-bit address space. This leaves very little 32-bit space available for GPUs. There is, however, by default, 512MB of space available that 64-bit OS code can use.

A Generation 2 VM, because it is not constrained by 32-bit code, has about 2GB of space that could have any GPU placed in it. (Some GPUs require 32-bit space and some don’t, and it’s difficult to tell the difference without just trying it.)

Either type of VM, however, can be reconfigured so that there’s more space in it for GPUs. If you want to reserve more space for a GPU that needs 32-bit space, you can use PowerShell:

Set-VM pickyourvmname -LowMemoryMappedIoSpace upto3000MBofmemoryhere

Similarly, if you want to reserve memory for GPUs above 32-bit space:

Set-VM pickyourvmname -HighMemoryMappedIoSpace upto33000MBofmemoryhere

Note that, if your GPU supports it, you can have a lot more space above 32-bits.

Lastly, GPUs tend to work a lot faster if the processor can run in a mode where bits in video memory can be held in the processor’s cache for a while before they are written to memory, waiting for other writes to the same memory. This is called “write-combining.” In general, this isn’t enabled in Hyper-V VMs. If you want your GPU to work, you’ll probably need to enable it:

Set-VM pickyourvmname -GuestControlledCacheTypes $true

None of these settings above can be applied while the VM is running.

Happy experimenting!

– Jake Oshins

↧

Discrete Device Assignment — Guests and Linux

November 24, 2015, 6:46 am

≫ Next: New Hyper-V Survey

≪ Previous: Discrete Device Assignment — GPUs

In my previous three posts, I outlined a new feature in Windows Server 2016 TP4 called Discrete Device Assignment. This post talks about support for Linux guest VMs, and it’s more of a description of my personal journey rather than a straight feature description. If it were just a description, I could do that in one line: We’re contributing code to Linux to make this work, and it will eventually hit a distro near you.

Microsoft has cared a lot about supporting Linux in a Hyper-V VM for a while now. We have an entire team of people dedicated to making that work better. Microsoft Azure, which uses Hyper-V, hosts many different kinds of Linux, and they’re constantly tuning to make that even better. What’s new for me is that I built the Linux front-end code for PCI Express pass-through, rather than asking a separate team to do it. I also built PCI Express pass-through for Windows (along with a couple of other people) for Windows Server 2012, which showed up as SR-IOV for networking, and Discrete Device Assignment for Windows Server 2016 TP4, which is built on the same underpinnings.

When I first started out to work on PCIe pass-through, I realized that the changes to Hyper-V would be really extensive. Fundamentally, we’re talking about distributing ownership of the devices inside your computer among multiple operating systems, some of which are more trusted than others. I was also trying to make sure that giving a device to an untrusted guest OS wouldn’t result in that OS taking the device hostage, making it impossible to do anything related to plug-and-play in the rest of the machine. Imagine if I told you that you couldn’t hot-plug another NVMe drive into your system because your storage utility VM wasn’t playing along, or even worse yet, your SQL server with SR-IOV networking was on-line and not in a state to be interrupted.

I actually spent a few months thinking about the problem (back in 2008 and ‘09) and how to solve it. The possible interactions between various OSes running on the same machine were combinatorically scary. So I went forth with some guiding principles, two of which were: Allow as few interactions as possible while still making it all work and keep the attack surface from untrusted VMs to an absolute minimum. The solution involved replacing the PCI driver in the guest OS with one that batched up lots of things into a few messages, passed through Hyper-V and on to the PnP Manager in the management OS, which manages the hardware as a whole.

When I went to try to enable Linux, however, I discovered that my two principles had caused me to come up with a protocol that was perfectly tailored to what Windows needs as a guest OS. Anything that would have made it easier to accommodate Linux got left out as “extra attack surface” or “another potential failure path that needs to be handled,” even though Linux does all the same things as Windows, but just a little differently. So the first step in enabling PCIe pass-through for Linux guests was actually to add some new infrastructure to Hyper-V. I still tried to minimize attack surface, and I think that I’ve added only that which was strictly necessary.

One of the challenges is that Linux device drivers are structured very differently from Windows device drivers. In Windows, a driver never interacts directly with the underlying driver for the bus itself. Windows drivers send I/O Request Packets (or IRPs) down to the bus driver. If they need to call a function in the bus driver in a very light-weight fashion, they send an IRP to the bus driver asking for a pointer to that function. This makes it possible, and even relatively easy, to replace the bus driver entirely, which is what we did for Windows. We replaced PCI.sys with vPCI.sys. vPCI.sys knows that it’s running in a virtual machine and that it doesn’t control the actual underlying hardware.
Linux has a lot of flexibility around PCI, of course. It runs on a vastly wider gamut of computers than Windows does. But instead of allowing the underlying bus driver to be replaced, Linux accommodates these things by allowing a device driver to supply low-level functions to the PCI code which do things like scan the bus and set up IRQs. These low-level functions required very different underlying support from Hyper-V.

As part of this, I’ve learned how to participate in open source software development, sending iteration upon iteration of patches to a huge body of people who then tell me that I don’t know anything about software development, with helpful pointers to their blogs explaining the one true way. This process is actually ongoing. Below is a link to the latest series of changes that I’ve sent in. Given that there hasn’t be any comment on it in a couple of weeks, it seems fairly likely that this, or something quite like this, will eventually make it into “upstream” kernels.

https://lkml.org/lkml/2015/11/2/672

Once it’s in the upstream kernels, the various distributions (Ubuntu, SUSE, RHEL, etc.) will eventually pick up the code as they move on to newer kernels. They can each individually choose to include the code or not, at their discretion, though most of the distros offer Hyper-V support by default. We may actually be able to work with them to back-port this to their long-term support products, though that’s far from certain at this point.

So if you’re comfortable patching, compiling and installing Linux kernels, and you want to play around with this, pull down the linux-next tree and apply the patch series. We’d love to know what you come up with.

– Jake Oshins

↧

New Hyper-V Survey

February 16, 2016, 3:31 pm

≫ Next: Setting up Linux Operating System Clusters on Hyper-V (1 of 3)

≪ Previous: Discrete Device Assignment — Guests and Linux

We want to hear from you regarding shielded VMs and troubleshooting!

When you have about 5 to 10 minutes, please take this short survey. The survey will close on February 23rd 2016, please submit your answers by then. It is recommended to take the survey on a desktop browser so the questions show up properly.

Thank you and have a great day.

Survey URL: https://www.instant.ly/s/t5uu5?s=vbl

Thank you for your participation!

Lars Iwer

↧

Setting up Linux Operating System Clusters on Hyper-V (1 of 3)

February 19, 2016, 3:16 pm

≫ Next: Setting up Linux Operating System Clusters on Hyper-V (2 of 3)

≪ Previous: New Hyper-V Survey

Author: Dexuan Cui

Background

When Linux is running on physical hardware, multiple computers may be configured in a Linux operating system cluster to provide high availability and load balancing in case of a hardware failure. Different clustering packages are available for different Linux distros, but for Red Hat Enterprise Linux (RHEL) and CentOS, Red Hat Cluster Suite is a popular choice to achieve these goals. A cluster consists of two or more nodes, where each node is an instance of RHEL or CentOS. Such a cluster usually requires some kind of shared storage, such as iSCSI or fibre channel, that is accessible from all of the nodes.

What happens when Linux is running in a virtual machine guest on a hypervisor, such as you might be using in your on-premises datacenter? It may still make sense to use a Linux OS cluster for high availability and load balancing. But how can you create shared storage in such an environment so that it is accessible to all of the Linux guests that will participate in the cluster? This series of blog posts answers these questions.

Overview

This series of blog posts walks through setting up Microsoft’s Hyper-V to create shared storage that can be used by Linux clustering software. Then it walks through setting up Red Hat Cluster Suite in that environment to create a five-node Linux OS cluster. Finally, it demonstrates an example application running in the cluster environment, and how a failover works.

The shared storage is created using Hyper-V’s Shared VHDX feature, which allows the VM users to create a VHDX file, and share that file among the guest cluster nodes as if the shared VHDX file were a shared Serial Attached SCSI disk. When the Shared VHDX feature is used, the .vhdx file itself still must reside in a location where it is accessible to all the nodes of a cluster. This means it must reside in a CSV (Cluster Shared Volume) partition or in an SMB 3.0 file share. For the example in this blog post series, we’ll use a host CSV partition, which requires a host cluster with an iSCSI target (server).

Note: To understand how clustering works, we need to first understand 3 important concepts in clustering: split-brain, quorum and fencing:

“Split-brain” is the idea that a cluster can have communication failures, which can cause it to split into subclusters
“Fencing” is the way of ensuring that one can safely proceed in these cases
“Quorum” is the idea of determining which subcluster can fence the others and proceed to recover the cluster services

These three concepts will be referenced in the remainder of this blog post series.

The walk-through will be in three blog posts:

Set up a Hyper-V host cluster and prepare for shared VHDX. Then set up five CentOS 6.7 VMs in the host cluster that use the shared VHDX. These five CentOS VMs will form the Linux OS cluster.
Set up a Linux OS cluster with the CentOS 6.7 VMs running RHCS and the GFS2 file system.
Set up a web server on one of the CentOS 6.7 nodes, and demonstrate various failover cases. Then Summary and conclusions.

Let’s get started!

Set up a host cluster and prepare for Shared VHDX

(Refer to Deploy a Hyper-V Cluster, Deploy a Guest Cluster Using a Shared Virtual Hard Disk)

Here we first setup an iSCSI target (server) on iscsi01 and then set up a 2-node Hyper-V host cluster on hyperv01 and hyperv02. Both nodes of the Hyper-V host cluster are running Windows Server 2012 R2 Hyper-V, with access to the iSCSI shared storage. The resulting configuration looks like this:

Setup an iSCSI target on iscsi01. (Refer to Installing and Configuring target iSCSI server on Windows Server 2012.) We don’t have to buy a real iSCSI hardware. Windows Server 2012 R2 can emulate an iSCSI target based on .vhdx files.

So we install “File and Storage Service” on iscsi01 using Server Manager -> Configure this local server -> Add roles and features -> Role-based or feature-based installation -> … -> Server Roles -> File and Storage Service -> iSCSI Target Server(add).

Then in Server Manager -> File and Storage Service -> iSCSI, use “New iSCSI Virtual Disk…” to create 2 .vhdx files: iscsi-1.vhdx (200GB) and iscsi-2.vhdx (1GB). In “iSCSI TARGETS”, allow hyperv01 and hyperv02 as Initiators (iSCSI clients).

On hyperv01 and hyperv02, use “iSCSI Initiator” to connect to the 2 LUNs of iscsi01. Now in “Disk Management” of both the hosts, 2 new disks should appear and one’s size is 200GB and the other’s size is 1GB.

In one host only, for example hyperv02, in “Disk Management”, we create and format a NTFS partition in the 200GB disk (remember to choose “Do not assign a drive letter or drive path”).

On hyperv01 and hyperv02, install “Failover Cluster Manager”

Server Manager -> Configure this local server -> Add roles and features -> Role-based or feature-based installation -> … -> Feature -> Failover Clustering.

On hyperv02, with Failover Cluster Manager -> “Create Cluster”, we create a host cluster with the 2 host nodes.

Using “Storage -> Disks | Add Disk”, we add the 2 new disks: the 200GB one is used as “Cluster Shared Volume” and the 1GB one is used as Disk Witness in Quorum. To set the 1GB disk as the Quorum Disk, after “Storage -> Disks | Add Disk”, right click the host node, choose More Actions -> Configure Cluster Quorum Settings… -> Next -> Select the quorum witness -> Configure a disk witness-> ….

Now, on both the 2 hosts, a new special shared directory C:\ClusterStorage\Volume1\ appears.

Set up CentOS 6.7 VMs in the host cluster with Shared VHDX

On hyperv02, with Failover Cluster Manager -> “Roles| Virtual Machines | New Virtual Machine” we create five CentOS 6.7 VMs. For the purposes of this walk through, these five VMs are given names “my-vm1”, “my-vm2”, etc., and these are the names you’ll see used in the rest of the walk through.

Make sure to choose “Store the virtual machine in a different location” and choose C:\ClusterStorage\Volume1\. In other words, my-vm1’s configuration file and .vhdx file are stored in C:\ClusterStorage\Volume1\my-vm1\Virtual Machines\ and C:\ClusterStorage\Volume1\my-vm1\Virtual Hard Disks\.

You can spread out the five VMs across the two Hyper-V hosts however you like, as both hosts have equivalent access to C:\ClusterStorage\Volume1\. The schematic diagram above shows three VMs on hyperv01 and two VMs on hyperv02, but the specific layout does not affect the operation of the Linux OS cluster or the subsequent examples in this walk through.

Use Static IP addresses and update /etc/hosts in all 5 VMs

Note: contact your network administrator to make sure the static IPs are reserved for this use.

So on my-vm1 in /etc/sysconfig/network-scripts/ifcfg-eth0, we have

DEVICE=eth0
TYPE=Ethernet
UUID=2b5e2f5a-3001-4e12-bf0c-d3d74b0b28e1
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=static
IPADDR=10.156.76.74
NETMASK=255.255.252.0
GATEWAY=10.156.76.1
DEFROUTE=yes
PEERDNS=yes
PEERROUTES=yes
IPV4_FAILURE_FATAL=yes
IPV6INIT=no
NAME="System eth0"

And in /etc/hosts, we have

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1       localhost localhost.localdomain localhost6 localhost6.localdomain6

10.156.76.74      my-vm1
10.156.76.92      my-vm2
10.156.76.48      my-vm3
10.156.76.79      my-vm4
10.156.76.75      my-vm5

On hyperv02, in my-vm1’s “Settings | SCSI Controller”, add a 100GB Hard Drive by using the “New Virtual Hard Disk Wizard”. Remember to store the .vhdx file in the shared host storage, e.g., C:\ClusterStorage\Volume1\100GB-shared-vhdx.vhdx and remember to enable the “Advanced Features | Enable virtual hard disk sharing”. Next we add the .vhdx file to the other 4 VMs with disk sharing enabled too. In all the 5 VMs, the disk will show as /dev/sdb. Later, we’ll create a clustering file system (GFS2) in it.
Similarly, we add another shared disk of 1GB (C:\ClusterStorage\Volume1\quorum_disk.vhdx) with the Shared VHDX feaure to all the 5 VMs. The small disk will show as /dev/sdc in the VMs and later we’ll use it as a Quorum Disk in RHCS.

Wrap Up

This completes the first phase of setting up Linux OS clusters. The Hyper-V hosts are running and configured, and we have five CentOS VMs running on those Hyper-V hosts. We have a Hyper-V Cluster Shared Volume (CSV) that is located on an iSCSI target, and containing the virtual hard disks for each of the five VMs.

The next blog post will describe how to actually setup the Linux OS clusters using the Red Hat Cluster Suite.

~ Dexuan Cui

↧

Setting up Linux Operating System Clusters on Hyper-V (2 of 3)

February 22, 2016, 5:22 pm

≫ Next: Setting up Linux Operating System Clusters on Hyper-V (3 of 3)

≪ Previous: Setting up Linux Operating System Clusters on Hyper-V (1 of 3)

Author: Dexuan Cui

Link to Part 1 Setting up Linux Operating System Clusters on Hyper-V

Background

This blog post is the second in a series of three that walks through setting up Linux operating system clusters on Hyper-V. The walk-through uses Red Hat Cluster Suite (RHCS) as the clustering storage and Hyper-V’s Shared VDHX as the shared storage needed by the cluster software.

Part 1 of the series showed how to set up a Hyper-V host cluster and a shared VHDX. Then it showed how to set up five CentOS 6.7 VMs in the host cluster, all using the shared VHDX.

This post will set up the Linux OS cluster with the CentOS 6.7 VMs, running RHCS and the GFS2 file system. RHCS is specifically for use with RHEL/CentOS 6.x; RHEL/CentOS 7.x uses a different clustering software package that is not covered by this walk through. The GFS2 file system is specifically designed to be used on shared disks accessed by multiple nodes in a Linux cluster, and so is a natural example to use.

Let’s get started!

Setup a guest cluster with the five CentOS 6.7 VMs running RHCS + GFS2 file system

On one node of the Linux OS cluster, say, my-vm1, install the web-based HA configuration tool luci

# yum groupinstall "High Availability Management"
# chkconfig luci on; service luci start

On all 5 nodes, install RHCS and make proper configuration change

# yum groupinstall "High Availability" "Resilient Storage"
# chkconfig iptables off
# chkconfig ip6tables off
# chkconfig NetworkManager off

Disable SeLinux by

edit /etc/selinux/config: SELINUX=disabled
# setenforce 0
# passwd ricci        [this user/password is to login the web-based HA configuration tool luci]

# chkconfig ricci on; service ricci start
# chkconfig cman on; chkconfig clvmd on
# chkconfig rgmanager on; chkconfig modclusterd on
# chkconfig gfs2 on
# reboot               [Can also choose to start the above daemons manually without reboot]

After 1 and 2, we should reboot all the nodes to make things take effect. Or we need to manually start or shut down the above service daemons on every node.

Optionally, remove the “rhgb quiet” kernel parameters for every node, so you can easily see which cluster daemon fails to start on VM bootup.

Use a web browser to access https://my-vm1:8084 (the web-based HA configuration tool luci — first login with root and grant the user ricci the permission to administrator and create a cluster, then logout and login with ricci)

Create a 5-node cluster “my-cluster”

We can confirm the cluster is created properly by checking the status of the service daemons and checking the cluster status (clustat):
```
service modclusterd status
service cman status
service clvmd status
service rgmanager status
clustat
```
e.g., when we run the commands in my-vm3, we get:
Add a fencing device (we use SCSI3 Persistent Registration) and associate all the VMs with it.

Fencing is used to prevent erroneous/unresponsive nodes from accessing the shared storage, so data consistency can be achieved.

See the below for an excerpt of IO fencing and SCSI3 PR:

“SCSI-3 PR, which stands for Persistent Reservation, supports multiple nodes accessing a device while at the same time blocking access to other nodes. SCSI-3 PR reservations are persistent across SCSI bus resets or node reboots and also support multiple paths from host to disk. SCSI-3 PR uses a concept of registration and reservation. Systems that participate, register a key with SCSI-3 device. Each system registers its own key. Then registered systems can establish a reservation. With this method, blocking write access is as simple as removing registration from a device. A system wishing to eject another system issues a preempt and abort command and that ejects another node. Once a node is ejected, it has no key registered so that it cannot eject others. This method effectively avoids the split-brain condition.”

This is how we add SCSI3 PR in RHCS:

NOTE 1: in /etc/cluster/cluster.conf, we need to manually specify devices=”/dev/sdb” and add a <unfence> for every VM. The web-based configuration tool doesn’t support this, but we do need this, otherwise cman can’t work properly.

NOTE 2: when we change /etc/cluster/cluster.conf manually, remember to increase “config_version” by 1 and propagate the new configuration to other nodes by “cman_tool version -r”.

Add a Quorum Disk to help to better cope with the Split-Brain issue. “In RHCS, CMAN (Cluster MANager) keeps track of membership by monitoring messages from other cluster nodes. When cluster membership changes, the cluster manager notifies the other infrastructure components, which then take appropriate action. If a cluster node does not transmit a message within a prescribed amount of time, the cluster manager removes the node from the cluster and communicates to other cluster infrastructure components that the node is not a member. Other cluster infrastructure components determine what actions to take upon notification that node is no longer a cluster member. For example, Fencing would disconnect the node that is no longer a member.A cluster can only function correctly if there is general agreement between the members regarding their status. We say a cluster has quorum if a majority of nodes are alive, communicating, and agree on the active cluster members. For example, in a thirteen-node cluster, quorum is only reached if seven or more nodes are communicating. If the seventh node dies, the cluster loses quorum and can no longer function.A cluster must maintain quorum to prevent split-brain issues. Quorum doesn’t prevent split-brain situations, but it does decide who is dominant and allowed to function in the cluster. Quorum is determined by communication of messages among cluster nodes via Ethernet. Optionally, quorum can be determined by a combination of communicating messages via Ethernet and through a quorum disk. For quorum via Ethernet, quorum consists of a simple majority (50% of the nodes + 1 extra). When configuring a quorum disk, quorum consists of user-specified conditions.”

In our 5-node cluster, if more than 2 nodes fail, the whole cluster will stop working.

Here we’d like to keep the cluster working even if there is only 1 node alive, that is, the “Last Man Standing” functionality (see How to Optimally Configure a Quorum Disk in Red Hat Enterprise Linux Clustering and High-Availability Environments), so we’re going to set up a quorum disk.

In my-vm1, use “fdisk /dev/sdc” to create a partition. Here we don’t run mkfs against it.
Run “mkqdisk -c /dev/sdc1 -l myqdisk” to initialize the qdisk partition and run “mkqdisk -L” to confirm it’s done successfully.

Use the web-based tool to configure the qdisk:

Here a heuristics is defined to help to check the healthiness of every node. On every node, the ping command is run every 2 seconds. In (2*10 = 20) seconds, if 10 successful runs of ping aren’t achieved, the node itself thinks it has failed. As a consequence, it won’t vote, and it will be fenced, and the node will try to reboot itself.

After we “apply” the configuration in the Web GUI, /etc/cluster/cluster.conf is updated with the new lines:

<cman expected_votes="9"/>
<quorumd label="myqdisk" min_score="1">
    <heuristic program="ping -c3 -t2 10.156.76.1" score="2" tko="10"/>
</quorumd>

And “clustat” and “cman_tool status” shows:

[root@my-vm1 ~]# clustat
Cluster Status for my-cluster @ Thu Oct 29 14:11:16 2015

Member Status: Quorate 
Member Name                                                     ID   Status
------ ----                                                     ---- ------
my-vm1                                                              1 Online, Local
my-vm2                                                              2 Online
my-vm3                                                              3 Online
my-vm4                                                              4 Online
my-vm5                                                              5 Online

/dev/block/8:33                                               0 Online, Quorum Disk

[root@my-vm1 ~]# cman_tool status

Version: 6.2.0
Config Version: 33
Cluster Name: my-cluster
Cluster Id: 25554
Cluster Member: Yes
Cluster Generation: 6604
Membership state: Cluster-Member
Nodes: 5
Expected votes: 9
Quorum device votes: 4
Total votes: 9
Node votes: 1
Quorum: 5
Active subsystems: 11
Flags:
Ports Bound: 0 11 177 178
Node name: my-vm1
Node ID: 1
Multicast addresses: 239.192.99.54
Node addresses: 10.156.76.74

Note 1: “Expected vote”: The expected votes value is used by cman to determine if the cluster has quorum. The cluster is quorate if the sum of votes of existing members is over half of the expected votes value. Here we have n=5 nodes. RHCS automatically specifies the vote value of the qdisk is n-1 = 4 and the expected votes value is n + (n -1) = 2n – 1 = 9. In the case only 1 node is alive, the effective vote value is: 1 + (n-1) = n, which is larger than (2n-1)/2 = n -1 (in C language), so the cluster will continue to function.

Note 2: In practice, “ping -c3 -t2 10.156.76.1” wasn’t always reliable – sometimes the ping failed after a timeout of 19 seconds and the related node was rebooted unexpectedly. Maybe it’s due to the firewall rule of the gateway server 10.156.76.1. In this case, replace “10.156.76.1” with “127.0.0.1” as a workaround.

Create a GFS2 file system in the shared storage /dev/sdb and test IO fencing

Create a 30GB LVM partition with fdisk

[root@my-vm1 ~]# fdisk /dev/sdb
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0x73312800.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.

WARNING: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

WARNING: DOS-compatible mode is deprecated. It's strongly recommended to switch off the mode (command 'c') and change display units to sectors (command 'u').

Command (m for help): n

Command action
    e   extended
    p   primary partition (1-4)

Partition number (1-4): 1
First cylinder (1-13054, default 1):
Using default value 1
Last cylinder, +cylinders or +size{K,M,G} (1-13054, default 13054): +30G

Command (m for help): p

Disk /dev/sdb: 107.4 GB, 107374182400 bytes
255 heads, 63 sectors/track, 13054 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x73312800

Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1        3917    31463271   83  Linux

Command (m for help): t

Selected partition 1
Hex code (type L to list codes): 8e
Changed system type of partition 1 to 8e (Linux LVM)

Command (m for help): w

The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

[root@my-vm1 ~]#

NOTE: the above fdisk command is run in node1. On nodes 2 through 4, we need to run “partprobe /dev/sdb” command to force the kernel to discover the new partition (another method is: we can simply reboot nodes 2 through 4).

Create physical & logical volumes, run mkfs.gfs2 and mount the file systemRun the following on node1:

# pvcreate /dev/sdb1
# vgcreate my-vg1 /dev/sdb1
# lvcreate -L +20G -n my-store1 my-vg1
# lvdisplay /dev/my-vg1/my-store1
# 
# mkfs.gfs2 -p lock_dlm -t my-cluster:storage -j5 /dev/mapper/my--vg1-my--store1

(Note: here “my-cluster” is the cluster name we used in Step 4.)

Run the following on all the 5 notes:

# mkdir /mydata
# echo '/dev/mapper/my--vg1-my--store1 /mydata  gfs2 defaults 0 0' >> /etc/fstab
# mount /mydata

Test read/write on the GFS2 partition
- Create or write a file /mydata/a.txt on one node, say, node 1
- On other nodes, say node 3, read /mydata/a.txt and we can immediately see what node 1 wrote
- On node 3, append a line into the file and on node 1 and the other nodes, the change is immediately visible.

Test node failure and IO fencing

First retrieve all the registrant keys and the registration information:

[root@my-vm1 mydata]# sg_persist -i -k -d /dev/sdb
 Msft      Virtual Disk      1.0
 Peripheral device type: disk
 PR generation=0x158, 5 registered reservation keys follow:
 0x63d20004
 0x63d20001
 0x63d20003
 0x63d20005
 0x63d20002

[root@my-vm1 mydata]# sg_persist -i -r -d /dev/sdb
 Msft      Virtual Disk      1.0
 Peripheral device type: disk
 PR generation=0x158, 5 registered reservation keys follow:
    0x63d20004
    0x63d20001
    0x63d20003
    0x63d20005
    0x63d20002

[root@my-vm1 mydata]# sg_persist -i -r -d /dev/sdb
 Msft      Virtual Disk      1.0
 Peripheral device type: disk
 PR generation=0x158, Reservation follows:
    Key=0x63d20005
scope: LU_SCOPE,  type: Write Exclusive, registrants only

Then pause node 5 using Hyper-V Manager, so node 5 will be considered dead.
In a few seconds, node 1 prints the kernel messages:

dlm: closing connection to node5
GFS2: fsid=my-cluster:storage.0: jid=4: Trying to acquire journal lock...
GFS2: fsid=my-cluster:storage.0: jid=4: Looking at journal...
GFS2: fsid=my-cluster:storage.0: jid=4: Acquiring the transaction lock...
GFS2: fsid=my-cluster:storage.0: jid=4: Replaying journal...
GFS2: fsid=my-cluster:storage.0: jid=4: Replayed 3 of 4 blocks
GFS2: fsid=my-cluster:storage.0: jid=4: Found 1 revoke tags
GFS2: fsid=my-cluster:storage.0: jid=4: Journal replayed in 1s
GFS2: fsid=my-cluster:storage.0: jid=4: Done

And nodes 2 through 4 print these messages:

dlm: closing connection to node5
GFS2: fsid=my-cluster:storage.2: jid=4: Trying to acquire journal lock...
GFS2: fsid=my-cluster:storage.2: jid=4: Busy, retrying...
    0x63d20004
    0x63d20001
    0x63d20003
    0x63d20005
    0x63d20002
[root@my-vm1 mydata]# sg_persist -i -r -d /dev/sdb
  Msft      Virtual Disk      1.0

Now on nodes 1 through 4, “clustat” shows node 5 is offline and “cman_tool status” shows the current “Total votes: 8”. And the sg_persist command show the current SCSI owner of /dev/sdb is changed from node 5 to node 1 and there are only 4 registered keys:

[root@my-vm4 ~]# sg_persist -i -k -d /dev/sdb
Msft Virtual Disk 1.0
Peripheral device type: disk
PR generation=0x158, 4 registered reservation keys follow:
0x63d20002
0x63d20003
0x63d20001
0x63d20004
[root@my-vm4 ~]# sg_persist -i -r -d /dev/sdb
Msft Virtual Disk 1.0
Peripheral device type: disk
PR generation=0x158, Reservation follows:
Key=0x63d20001
scope: LU_SCOPE, type: Write Exclusive, registrants only

In a word, the dead node 5 properly became offline and was fenced, and node1 has fixed a file system issue (“Found 1 revoke tags”) by replaying node 5’s GFS2 journal, so we have no data inconsistency issue.

Now let’s resume node 5 and we’ll find the cluster still doesn’t accept the node 5 as an online cluster member before node 5 reboots and rejoins the cluster with a known-good state.

Note: node 5 will be automatically rebooted by the qdisk daemon.

If we perform the above experiment by shutting down a node’s network (by “ifconfig eth0 down”), e.g., on node 3, we’ll get the same result, that is, node 3’s access to /mydata will be rejected and eventually the qdisk daemon will reboot node 3 automatically.

Wrap Up

Wow! That’s a lot of steps, but the result is worth it. You now have a 5 node Linux OS cluster with a shared GFS2 file system that can be read and written from all nodes. The cluster uses a quorum disk to prevent split-brain issues. These steps to set up a RHCS cluster are the same as you would use to set up a cluster of physical servers running CentOS 6.7, but the Hyper-V environment Linux is running in guest VMs, and shared storage is created on a Shared VHDX instead of a real physical shared disk.

In the last blog post, we’ll show setting up a web server on one of the CentOS 6.7 nodes, and demonstrate various failover cases.

~ Dexuan Cui

↧

Setting up Linux Operating System Clusters on Hyper-V (3 of 3)

March 2, 2016, 12:22 pm

≫ Next: Linux Integration Services 4.1

≪ Previous: Setting up Linux Operating System Clusters on Hyper-V (2 of 3)

Author: Dexuan Cui

Link to Part 2: Setting up Linux Operating System Clusters on Hyper-V
Link to Part 1: Setting up Linux Operating System Clusters on Hyper-V

Background

This blog post is the third in a series of three that walks through setting up Linux operating system clusters on Hyper-V. The walk-through uses Red Hat Cluster Suite (RHCS) as the clustering storage and Hyper-V’s Shared VHDX as the shared storage needed by the cluster software.

Part 1 of the series showed how to set up a Hyper-V host cluster and a shared VHDX. Then it showed how to set up five CentOS 6.7 VMs in the host cluster, all using the shared VHDX.

Part 2 of the series showed how to set up the Linux OS cluster with the CentOS 6.7 VMs, running RHCS and the GFS2 file system. The GFS2 file system is specifically designed to be used on shared disks accessed by multiple node in a Linux cluster, and so is a natural example to use.

This post now makes use of the Linux OS cluster to provide high availability. A web server is set up on one of the CentOS 6.7 nodes, and various failover cases are demonstrated.

Let’s get started!

Setup a web server running on a node and experiment with the failover case

Note: this is actually an “Active-Passive” cluster (A cluster where only one node runs a given service at a time, and the other nodes are in stand-by to take over, should the need arise). Setting up an “Active-Active” cluster is much more complex, because it requires great awareness of the underlying applications, thus one mostly sees this with very specific applications – e.g. database servers that are designed to support multiple database servers accessing the same database disk storage.

Add a Failover Domain
“A failover domain is a named subset of cluster nodes that are eligible to run a cluster service in the event of a node failure”.
With the below configuration (priority 1 is of the highest priority and priority 5 is of the lowest priority), by default the web server runs on node 1. If node 1 fails, node 2 will take over and run the web server. If node 2 fails, node 3 will take over, etc.
Install & configure Apache on every node

# yum install httpd
# chkconfig httpd off   # By Default, Apache doesn’t automatically start.

On node 1, make the minimal change to the default Apache config file /etc/httpd/conf/httpd.conf:

(Note: /mydata is in the shared GFS2 partition)

-DocumentRoot "/var/www/html"
+DocumentRoot "/mydata/html"

-<Directory "/var/www/html">
+<Directory "/mydata/html">

And scp /etc/httpd/conf/httpd.conf to the other 4 nodes.

Next, add a simple html file /mydata/html/index.html with the below content:

<html> <body> <h1> "Hello, World" (test page)</h1>  </body> </html>

Define the “Resources” and “Service Group” of the cluster

Note: here 10.156.76.58 is the “floating IP” (a.k.a. virtual IP). An end user uses http://10.156.76.58 to access the web server, but the web server httpd daemon can be running on any node of the cluster according to the fail over configuration, when some of the nodes fail.

Test the Web Server from another host

Use a browser to access http://10.156.76.58/

Keep pressing “F5” to refresh the page and everything works fine.

We can verify the web server is actually running on node1:

[root@my-vm1 ~]# ps aux | grep httpd

root     13539  0.0  0.6 298432 12744 ?        S<s  21:38   0:00 /usr/sbin/httpd -Dmy_apache -d /etc/httpd -f /etc/cluster/apache/apache:my_apache/httpd.conf -k start

Test Fail Over

Shutdown node 1 by “shutdown -h now” and the end user will detect this failure immediately by keeping pressing F5:
In ~15 seconds, the end user finds the web server backs to normal:

Now, we can verify the web server is running on node 2:

[root@my-vm2 ~]# ps aux | grep http

root     13879  0.0  0.6 298432 12772 ?        S<s  21:58   0:00 /usr/sbin/httpd -Dmy_apache -d /etc/httpd -f /etc/cluster/apache/apache:my_apache/httpd.conf -k start

And we can check the cluster status:

[root@my-vm2 ~]# clustat
Cluster Status for my-cluster @ Thu Oct 29 21:59:40 2015

Member Status: Quorate 
 Member Name                                   ID   Status
 ------ ----                                   ---- ------
 my-vm1                                           1 Offline
 my-vm2                                           2 Online, Local, rgmanager
 my-vm3                                           3 Online, rgmanager
 my-vm4                                           4 Online, rgmanager
 my-vm5                                           5 Online, rgmanager
 /dev/block/8:33                                  0 Online, Quorum Disk

Service Name                        Owner (Last)          State
------- ----                        ----- ------          -----
service:my_service_group            my-vm2                started

Now we power off node 2 by clicking Virtual Machine Connection’s “Turn Off” icon.

Similarly, we’ll find out node 3 will take over node 2 and the end user can still notice the webserver backs to normal after a transient black-out.

[root@my-vm3 ~]# clustat
Cluster Status for my-cluster @ Thu Oct 29 22:03:57 2015

Member Status: Quorate 
 Member Name                                      ID   Status
 ------ ----                                      ---- ------
 my-vm1                                              1 Offline
 my-vm2                                              2 Offline
 my-vm3                                              3 Online, Local, rgmanager
 my-vm4                                              4 Online, rgmanager
 my-vm5                                              5 Online, rgmanager
 /dev/block/8:33                                     0 Online, Quorum Disk

 Service Name                            Owner (Last)      State
 ------- ----                            ----- ------      ------          
 service:my_service_group                my-vm3            started

Now we power off node 3 and 4 and later we’ll find the web server will be running in node 5, the last node, in ~20 seconds.
Now let’s power on node 1 and after node 1 re-joins the cluster, the web server will be moved from node 5 to node 1.

Summary and conclusions

We’ve often been asked whether Linux OS clusters can be created for Linux guests running on Hyper-V. The answer is “Yes!” This series of 3 blog posts shows how to set up Hyper-V and make use of the Shared VHDX feature to provide shared storage for the cluster nodes. Then it shows how to set up Red Hat Cluster Suite and a shared GFS2 file system. Finally, we wrapped up with a demonstration of a web server that fails over from one cluster node to another.

Other cluster software is available for other Linux distros and versions, so the process for your particular environment may be different, but the fundamental requirement for shared storage is typically the same across different cluster packages. Hyper-V and Shared VHDX provide the core infrastructure you need, and then you can install and configure your Linux OS clustering software to meet your particular requirements.

Thank you for following this series,
Dexuan Cui

↧

Linux Integration Services 4.1

March 21, 2016, 11:46 am

≫ Next: //build 2016 Container Announcements: Hyper-V Containers and Windows 10 and PowerShell For Docker!

≪ Previous: Setting up Linux Operating System Clusters on Hyper-V (3 of 3)

We are pleased to announce the availability of Linux Integration Services (LIS) 4.1. This new release expands supported releases to Red Hat Enterprise Linux, CentOS, and Oracle Linux with Red Hat Compatible Kernel 5.2, 5.3, 5.4, and 7.2. In addition to the latest bug fixes and performance improvements for Linux guests running on Hyper-V this release includes the following new features:

Hyper-V Sockets (Windows Server Technical Preview)
Manual Memory Hot-Add (Windows Server Technical Preview)
SCSI WNN
lsvmbus
Uninstallation scripts

See the ReadMe file for more information.

Download Location

The Linux Integration Services installation scripts and RPMs are available either as a tar file that can be uploaded to a virtual machine and installed, or an ISO that can be mounted as a CD. The files are available from the Microsoft Download Center here: https://www.microsoft.com/en-us/download/details.aspx?id=51612

A ReadMe file has been provided information on installation, upgrade, uninstallation, features, and known issues.

See also the TechNet article “Linux and FreeBSD Virtual Machines on Hyper-V” for a comparison of LIS features and best practices for use here: https://technet.microsoft.com/en-us/library/dn531030.aspx

Linux Integration Services code is released under the GNU Public License version 2 (GPLv2) and is freely available at the LIS GitHub project here: https://github.com/LIS

Supported Virtualization Server Operating Systems

Linux Integration Services (LIS) 4.1 allows Linux guests to use Hyper-V virtualization on the following host operating systems:

Windows Server 2008 R2 (applicable editions)
Microsoft Hyper-V Server 2008 R2
Windows 8 Pro and 8.1 Pro
Windows Server 2012 and 2012 R2
Microsoft Hyper-V Server 2012 and 2012 R2
Windows Server Technical Preview
Microsoft Hyper-V Server Technical Preview
Microsoft Azure.

Applicable Linux Distributions

Microsoft provides Linux Integration Services for a broad range of Linux distros as documented in the “Linux and FreeBSD Virtual Machines on Hyper-V” topic on TechNet. Per that documentation, many Linux distributions and versions have Linux Integration Services built-in and do not require installation of this separate LIS package from Microsoft. This LIS package is available for a subset of supported distributions in order to provide the best performance and fullest use of Hyper-V features. It can be installed in the listed distribution versions that do not already have LIS built, and can be installed as an upgrade in listed distribution versions that already have LIS built in. LIS 4.1 is applicable to the following guest operating systems:

Red Hat Enterprise Linux 5.2-5.11 32-bit, 32-bit PAE, and 64-bit
Red Hat Enterprise Linux 6.0-6.7 32-bit and 64-bit
Red Hat Enterprise Linux 7.0-7.2 64-bit
CentOS 5.2-5.11 32-bit, 32-bit PAE, and 64-bit
CentOS 6.0-6.7 32-bit and 64-bit
CentOS 7.0-7.2 64-bit
Oracle Linux 6.4-6.7 with Red Hat Compatible Kernel 32-bit and 64-bit
Oracle Linux 7.0-7.2 with Red Hat Compatible Kernel 64-bit

↧

//build 2016 Container Announcements: Hyper-V Containers and Windows 10 and PowerShell For Docker!

April 1, 2016, 5:00 am

≫ Next: Windows Container Networking

≪ Previous: Linux Integration Services 4.1

4/26 – Quick update to this post, the GitHub repo for the new PowerShell module for Docker is now public. (https://github.com/Microsoft/Docker-PowerShell/)

//Build/ will always be a special place for Windows containers, this was the stage where last year we first showed the world a Windows Server Container. So it’s fitting that back home at //build/ this year we have two new announcements to make.

First, as we all know Windows is an operating system that users love to customize! From backgrounds and icon locations to font sizes and window layouts. everyone has their own preferences. This is even more true for developers: source code locations, debugger configuration, color schemes, environment variables and default tool configurations are important for optimal efficiency. Whether you are a front-end engineer building a highly scalable presentation layer on top of a multi-layer middle tier, or you are a platform developer building the next amazing database engine, being able to do all of your development in your environment is crucial to your productivity.

For developers using containers typically this has required running a server virtual machine on their development machine and then running containers inside that virtual machine. This leads to complex and sometimes problematic cross-machine orchestration and cumbersome scenarios for code sharing and debugging; fracturing that personal and optimized developer experience. Today we are incredibly excited to be ending this pain for Windows developers by bringing Hyper-V Containers natively into Windows 10! This will further empower developers to build amazing cloud applications benefiting from native container capabilities right in Windows. Since Hyper-V Containers utilize their own instance of the Windows kernel, your container is truly a server container all the way down the kernel. Plus, with the flexibility of Windows container runtimes containers built on Windows 10 can be run on Windows Server 2016 as either Windows Server Containers or Hyper-V Containers.

Windows Insider’s will start to see a new “Containers” feature in the Windows Features dialog in upcoming flights and with the upcoming release of Windows Server 2016 Technical Preview 5 the Nano Server container OS image will be made available for download along with an updated Docker engine for Windows. Stay tuned to https://aka.ms/containers for all of the details and a new quick start guide that will get you up and running with Hyper-V Containers on Windows 10 in the near future!

Secondly, since last year at //build/ we’ve been asking for your thoughts and feedback on Windows containers and we can’t thank you enough for the forum posts, tweets, GitHub comments, in person conversations etc… (please keep them coming!). Among all of these comments one request has come up more than any other – why can’t I see Docker containers from PowerShell?

As we’ve discussed the pro’s, con’s and various options with you we’ve come to the conclusion that our current container PowerShell module needs an update… So today we are announcing that we are deprecating the container PowerShell module that has been shipping in the preview builds of Windows Server 2016 and replacing it with a new PowerShell module for Docker. We have already started development on this new module and will be open sourcing it in the near future as part of what we hope will be a community collaboration on building a great PowerShell experience for containers though the Docker engine. This new module builds directly on top of the Docker Engine’s REST interface enabling user choice between the Docker CLI, PowerShell or both.

Building a great PowerShell module is no easy task, between getting all of the code right and striking the right balance of objects and parameters sets and cmdlet names are all super important. So as we are embarking on this new module we are going to be looking to you – our end users and the vast PowerShell and Docker communities to help shape this module. What parameter sets are important to you? Should we have an equivalent to “docker run” or should you pipe new-container to start-container – what would you want… To learn more about this module and participate in the development we will be launching a new page at https://aka.ms/windowscontainers/powershell in the next few days so head on over take a look at let us know what you think!

-Taylor

↧

Windows Container Networking

May 5, 2016, 3:26 pm

≫ Next: What Happened to the “NAT” VMSwitch?

≪ Previous: //build 2016 Container Announcements: Hyper-V Containers and Windows 10 and PowerShell For Docker!

Actual Author: Jason Messer

All of the technical documentation corresponding to this post is available here.

There is a lot excitement and energy around the introduction of Windows containers and Microsoft’s partnership with Docker. For Windows Server Technical Preview 5, we invested heavily in the container network stack to better align with the Docker management experience and brought our own networking expertise to add additional features and capabilities for Windows containers! This article will describe the Windows container networking stack, how to attach your containers to a network using Docker, and how Microsoft is making containers first-class citizens in the modern datacenter with Microsoft Azure Stack.

Introduction

Windows Containers can be used to host all sorts of different applications from web servers running Node.js to databases, to video streaming. These applications all require network connectivity in order to expose their services to external clients. So what does the network stack look like for Windows containers? How do we assign an IP address to a container or attach a container endpoint to a network? How do we apply advanced network policy such as maximum bandwidth caps or access control list (ACL) rules to a container?

Let’s dive into this topic by first looking at a picture of the container’s network stack in Figure 1.

Figure 1 – Windows Container Network Stack

All containers run inside a container host which could be a physical server, a Windows client, or a virtual machine. It is assumed that this container host already has network connectivity through a NIC card using WiFi or Ethernet which it needs to extend to the containers themselves. The container host uses a Hyper-V virtual switch to provide this connectivity to the containers and connects the containers to the virtual switch (vSwitch) using either a Host virtual NIC (Windows Server Containers) or a Synthetic VM NIC (Hyper-V Containers). Compare this with Linux containers which use a bridge device instead of the Hyper-V Virtual Switch and veth pairs instead of vNICs / vmNICs to provide this basic Layer-2 (Ethernet) connectivity to the containers themselves.

The Hyper-V virtual switch alone does not allow network services running in a container to be accessible from the outside world, however. We also need Layer-3 (IP) connectivity to correctly route packets to their intended destination. In addition to IP, we need higher-level networking protocols such as TCP and UDP to correctly address specific services running in a container using a port number (e.g. TCP Port 80 is typically used to access a web server). Additional Layer 4- 7 services such as DNS, DHCP, HTTP, SMB, etc. are also required for containers to be useful. All of these options and more are supported with Windows container networking.

Docker Network Configuration and Management Stack

New in Windows Server Technical Preview 5 (TP5) is the ability to setup container networking using the Docker client and Docker engine’s RESTful API. Network configuration settings can be specified either at container network creation time or at container creation time depending upon the scope of the setting. Reference MSDN article ( ) for more information.

The Windows Container Network management stack uses Docker as the management surface and the Windows Host Network Service (HNS) as a servicing layer to create the network “plumbing” underneath (e.g. vSwitch, WinNAT, etc.). The Docker engine communicates with HNS through a network plug-in (libnetwork). Reference Figure 2 to see the updated management stack.

Figure 2 – Management Stack

With this Docker network plugin interfacing with the Windows network stack through HNS, users no longer have to create their own static port mappings or custom Windows Firewall rules for NAT as these are automatically created for you.

Example: Create static Port Mapping through Docker
Note: NetNatStaticMapping (and Firewall Rule) created automatically

Networking Modes

Windows containers will attach to a container host network using one of four different network modes (or drivers). The networking mode used determines how the containers will be accessible to external clients, how IP addresses will be assigned, and how network policy will be enforced.

Each of these networking modes use an internal or external VM Switch – created automatically by HNS – to connect containers to the container host’s physical (or virtual) network. Briefly, the four networking modes are given below with recommended usage. Please refer to the MSDN article (here) for more in-depth information about each mode:

NAT – this is the default network mode and attaches containers to a private IP subnet. This mode is quick and easy to use in any environment.
Transparent – this networking mode attaches containers directly to the physical network without performing any address translation. Use this mode with care as it can quickly cause problems in the physical network when too many containers are running on a particular host.
L2 Bridge / L2 Tunnel – these networking modes should usually be reserved for private and public cloud deployments when containers are running on a tenant VM.

Note: The “NAT” VM Switch Type will no longer be available in Windows Server 2016 or Windows 10 client builds. NAT container networks can be created by specifying the “nat” driver in Docker or NAT Mode in PowerShell.

Example: Create Docker ‘nat’ network
Notice how VM Switch and NetNat are created automatically

screenshot2

Container Networking + Software Defined Networking (SDN)

Containers are increasingly becoming first-class citizens in the datacenter and enterprise alongside virtual machines. IaaS cloud tenants or enterprise business units need to be able to programmatically define network policy (e.g. ACLs, QoS, load balancing) for both VM network adapters as well as container endpoints. The Software Defined Networking (SDN) Stack (TechNet topic) in Windows Server 2016 allows customers to do just that by creating network policy for a specific container endpoint through the Windows Network Controller using either PowerShell scripts, SCVMM, or the new Azure Portal in the Microsoft Azure Stack.

In a virtualized environment, the container host will be a virtual machine running on a physical server. The Network Controller will send policy down to a Host Agent running on the physical server using standard SouthBound channels (e.g. OVSDB). The Host Agent will then program this policy into the VFP extension in the vSwitch on the physical server where it will be enforced. This network policy is specific to an IP address (e.g. container end-point) so that even though multiple container endpoints are attached through a container host VM using a single VM network adapter, network policy can still be granularly defined.

Using the L2 tunnel networking mode, all container network traffic from the container host VM will be forwarded to the physical server’s vSwitch. The VFP forwarding extension in this vSwitch will enforce the policy received from the Network Controller and higher-levels of the Azure Stack (e.g. Network Resource Provider, Azure Resource Manager, Azure Portal). Reference Figure 3 to see how this stack looks.

Figure 3 –Containers attaching to SDN overlay virtual network

This will allow containers to join overlay virtual networks (e.g. VxLAN) created by individual cloud tenants to communicate across multi-node clusters and with other VMs. as well as receive network policy.

Future Goodness

We will continue to innovate in this space not only by adding code to the Windows OS but also by contributing code to the open source Docker project on GitHub. We want Windows container users to have full access to the rich set of network policy and be able to create this policy through the Docker client. We’re also looking at ways to apply network policy as close to the container endpoint as possible in order to shorten the data-path and thereby improve network throughput and decrease latency.

Please continue to offer your feedback and comments on how we can continue to improve Windows Containers!

~ Jason Messer

↧

What Happened to the “NAT” VMSwitch?

May 13, 2016, 5:00 pm

≫ Next: Windows NAT (WinNAT) — Capabilities and limitations

≪ Previous: Windows Container Networking

Author: Jason Messer

Beginning in Windows Server Technical Preview 3, our users noticed a new Hyper-V Virtual Switch Type – “NAT” – which was introduced to simplify the process of connecting Windows containers to the host using a private network. This allowed network traffic sent to the host to be redirected to individual containers running on the host through network and port address translation (NAT and PAT) rules. Additional users began to use this new VM Switch type not only for containers but also for ordinary VMs to connect them to a NAT network. While this may have simplified the process of creating a NAT network and connecting containers or VMs to a vSwitch, it resulted in confusion and a layering violation in the network stack.

Beginning in Windows Server Technical Preview 5 and with recent Windows Insider Builds, the “NAT” VM Switch Type has been removed to resolve this layering violation.

In the OSI (Open Systems Interconnect) model, both physical network switches and virtual switches operate at Layer-2 of the network stack without any knowledge of IP addresses or ports. These switches simply forward packets based on the Ethernet headers (i.e. MAC addresses) in the Layer-2 frame. NAT and PAT operate at Layers-3 and 4 respectively of the network stack.

Layer	Function	Example
Application (7)	Network process	HTTP, SMTP, DNS
Presentation (6)	Data representation and encryption	JPG, GIF, SSL, ASCII
Session (5)	Interhost communication	NetBIOS
Transport (4)	End-to-End Connections	TCP, UDP (Ports)
Network (3)	Path determination and routing based on IP addresses	Routers
Data Link (2)	Forward frames based on MAC addresses	802.3 Ethernet, Switches
Physical (1)	Send data through physical signaling	Network cables, NIC cards

Creating a “NAT” VM Switch type actually combined several operations into one which can still be done today (detailed instructions can be found here):

Create an “internal” VM Switch
Create a Private IP network for NAT
Assign the default gateway IP address of the private network to the internal VM switch Management Host vNIC

In Technical Preview 5 we have also introduced the Host Network Service (HNS) for containers which is a servicing layer used by both Docker and PowerShell management surfaces to creates the required network “plumbing” for new container networks. A user who wants to create a NAT container network through docker, will simply execute the following:

c:\> docker network create -d nat MyNatNetwork

and HNS will take care of the details such as creating the internal vSwitch and NAT.

Looking forward, we are considering how we can create a single arbitrator for all host networking (regardless of containers or VMs) so that these workflows and networking primitives will be consistent.

~ Jason

↧

Windows NAT (WinNAT) — Capabilities and limitations

May 25, 2016, 2:15 pm

≫ Next: Which Linux Integration Services should I use in my Linux VMs?

≪ Previous: What Happened to the “NAT” VMSwitch?

Author: Jason Messer

How many devices (e.g. laptops, smart phones, tablets, DVRs, etc.) do you have at home which connect to the internet? Each of these devices probably has an IP address assigned to it, but did you know that that the public internet actually only sees one IP address for all of these devices? How does this work?

Network Address Translation (NAT) allows users to create a private, internal network which shares a public IP address(es). When a new connection is established (e.g. web browser session to www.bing.com) the NAT translates the private (source) IP address assigned to your device to the shared public IP address which is routable on the internet and creates a new entry in the NAT flow state table. When a connection comes back into your network, the public (destination) IP address is translated to the private IP address based on the matching flow state entry.

This same NAT technology and concept can also work with host networking using virtual machine and container endpoints running on a single host. IP addresses from the NAT internal subnet (prefix) can be assigned to VMs, containers, or other services running on this host. Similar to how the NAT translates the source IP address of a device, NAT can also translate the source IP address of a VM or container to the IP address of a virtual network adapter (Host vNIC) on the host.

Figure 1: Physical vs Virtual NAT

Similarly, the TCP/UDP ports can also be translated (Port Address Translation – PAT) so that traffic received on an external port can be forwarded to a different, internal port. These are known as static mappings.

Figure 2: Static Port Mappings

Host Networking with WinNAT to attach VMs and Containers

The first step in creating a host network for VMs and containers is to create an internal Hyper-V virtual switch in the host. This provides Layer-2 (Ethernet) internal connectivity between the endpoints. In order to obtain external connectivity through a NAT (using WinNAT), we add a Host vNIC to the internal vSwitch and assign the default gateway IP address of the NAT to this vNIC. This essentially creates a router so that any network traffic from one of the endpoints that is destined for an IP address outside of the internal network (e.g. bing.com) will go through the NAT translation process.

Note: when the Windows container feature is installed, the docker daemon creates a default NAT network automatically when it starts. To stop this network from being created, make sure the docker daemon (dockerd) is started with the ‘-b “none”’ argument specified.

In addition to address translation, WinNAT also allows users to create static port mappings or forwarding rules so that internal endpoints can be accessed from external clients. Take for example an IIS web server running in a container attached to the default NAT network. The IIS web server will be listening on port 80 and so it requires that any connections coming in on a particular port to the host from an external client will be forwarded or mapped to port 80 on the container. Reference Figure 2 above to see port 8080 on the host being mapped to port 80 on the container.

In order to create a NAT network to connect VMs, please follow these instructions: https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/user_guide/setup_nat_network

In order to create a NAT network for containers (or use the default nat network) please follow these instructions:
https://msdn.microsoft.com/en-us/virtualization/windowscontainers/management/container_networking

Key Limitations in WinNAT (Windows NAT implementation)

Multiple internal subnet prefixes not supported
External / Internal prefixes must not overlap
No automatic networking configuration
Cannot access externally mapped NAT endpoint IP/ ports directly from host – must use internal IP / ports assigned to endpoints

Multiple Internal Subnet Prefixes

Consider the case where multiple applications / VMs / containers each require access to a private NAT network. WinNAT only allows for one internal subnet prefix to be created on a host which means that if multiple applications or services need NAT connectivity, they will need to coordinate between each other to share this internal NAT network; each application cannot create its own individual NAT network. This may require creating a larger private NAT subnet (e.g. /18) for these endpoints. Moreover, the private IP addresses assigned to the endpoints cannot be re-used so that IP allocation also needs to be coordinated.

Figure 3: Multiple Internal NAT Subnets are not allowed – combined into a larger, shared subnet

Lastly, individual external host ports can only be mapped to one internal endpoint. A user cannot create two static port mappings with external port 80 and have traffic directed to two different internal endpoints. These static port mappings must be coordinated between applications and services requiring NAT. WinNAT does not support dynamic port mappings (i.e. allowing WinNAT to automatically choose an external – or ephemeral – host port to be used by the mapping).

Note: Dynamic port mappings are supported through docker run with the -p | -P options since IP address management (IPAM) is handled by the Host Network Service (HNS) for containers.

Overlapping External and Intern IP Prefixes

A NAT network may be created on either a client or server host. When the NAT is first created, it must be ensured that the internal IP prefix defined does not overlap with the external IP addresses assigned to the host.

Example – This is not allowed:

Internal, Private IP Subnet: 172.16.0.0/12
IP Address assigned to the Host: 172.17.0.4

If a user is roaming on a laptop and connects to a different physical network such that the container host’s IP address is now within the private NAT network, the internal IP prefix of the NAT will need to be modified so that it does not overlap.

Automatic Network Configuration

WinNAT itself does not dynamically assign IP addresses, routes, DNS servers, or other network information to an endpoint. For container endpoints, since HNS manages IPAM, HNS will assign IP networking information from the NAT network to container endpoints. However, if a user is creating a VM and connecting a VM Network Adapter to a NAT network, the admin must assign the IP configuration manually inside the VM.

Accessing internal endpoints directly from the Host

Internal endpoints assigned to VMs or containers cannot be accessed using the external IPs / ports referenced in NAT static port mappings directly from the NAT host. From the NAT host, these internal endpoints must be addressed directly by their internal IP and ports. For instance, assume a container endpoint has IP 172.16.1.100 and is running a web server which is listening on port 80. Moreover, assume a port mapping has been created through docker to forward traffic from the host’s IP address (10.10.50.20) received on TCP port 8080 to the container endpoint. In this case, a user on the container host cannot directly access the web server using the externally mapped ports. e.g. A user operating on the container host cannot access the container web server indirectly on http://10.10.50.20:8080. Instead, the user must directly access the container web server on http://172.16.1.100:80.

The one caveat to this limitation is that the internal endpoint can be accessed using the external IP/port from a separate, VM/container endpoint running on the same NAT host: this is called hair-pinning. E.g. A user operating on container A can access a web server running in Container B using the internal IP and port of http://10.10.50.20:8080.

Configuration Example: Attaching VMs and Containers to a NAT network

If you need to attach multiple VMs and containers to a single NAT, you will need to ensure that the NAT internal subnet prefix is large enough to encompass the IP ranges being assigned by different applications or services (e.g. Docker for Windows and Windows Container – HNS). This will require either application-level assignment of IPs and network configuration or manual configuration which must be done by an admin and guaranteed not to re-use existing IP assignments on the same host.

The solution below will allow both Docker for Windows (Linux VM running Linux containers) and Windows Containers to share the same WinNAT instance using separate internal vSwitches. Connectivity between both Linux and Windows containers will work.

Example

User has connected VMs to a NAT network through an internal vSwitch named “VMNAT” and now wants to install Windows Container feature with docker engine
1. ```
PS C:\> Get-NetNat “VMNAT”| Remove-NetNat (this will remove the NAT but keep the internal vSwitch).
```
2. Install Windows Container Feature
3. DO NOT START Docker Service (daemon)
4. Edit the arguments passed to the docker daemon (dockerd) by adding –fixed-cidr=<container prefix> parameter. This tells docker to create a default nat network with the IP subnet <container prefix> (e.g. 192.168.1.0/24) so that HNS can allocate IPs from this prefix.
5. ```
PS C:\> Start-Service Docker; Stop-Service Docker
```
6. ```
PS C:\> Get-NetNat | Remove-NetNAT (again, this will remove the NAT but keep the internal vSwitch)
```
7. ```
PS C:\> New-NetNat -Name SharedNAT -InternalIPInterfaceAddressPrefix <shared prefix>
```
8. ```
PS C:\> Start-Service docker
```

Docker/HNS will assign IPs to Windows containers from the <container prefix>

Admin will assign IPs to VMs from the difference set of the <shared prefix> and <container prefix>

User has installed Windows Container feature with docker engine running and now wants to connect VMs to the NAT network

```
PS C:\> Stop-Service docker
```

PS C:\> Get-ContainerNetwork | Remove-ContainerNetwork -force

PS C:\> Get-NetNat | Remove-NetNat (this will remove the NAT but keep the internal vSwitch)

Edit the arguments passed to the docker daemon (dockerd) by adding -b “none” option to the end of docker daemon (dockerd) command to tell docker not to create a default NAT network.

PS C:\> New-ContainerNetwork –name nat –Mode NAT –subnetprefix <container prefix> (create a new NAT and internal vSwitch – HNS will allocate IPs to container endpoints attached to this network from the <container prefix>)

PS C:\> Get-Netnat | Remove-NetNAT (again, this will remove the NAT but keep the internal vSwitch)

PS C:\> New-NetNat -Name SharedNAT -InternalIPInterfaceAddressPrefix <shared prefix>

PS C:\> New-VirtualSwitch -Type internal (attach VMs to this new vSwitch)

```
PS C:\> Start-Service docker
```

Docker/HNS will assign IPs to Windows containers from the <container prefix>

Admin will assign IPs to VMs from the difference set of the <shared prefix> and <container prefix>

In the end, you should have two internal VM switches and one NetNat shared between them.

Troubleshooting

Make sure you only have one NAT

Get-NetNat

If a NAT already exists, please delete it

Get-NetNat | Remove-NetNat

Make sure you only have one “internal” vmSwitch for the application or feature (e.g. Windows containers). Record the name of the vSwitch for Step 4

Get-VMSwitch

Check to see if there are private IP addresses (e.g. NAT default Gateway IP Address – usually *.1) from the old NAT still assigned to an adapter

Get-NetIPAddress -InterfaceAlias "vEthernet(<name of vSwitch>)"

If an old private IP address is in use, please delete it

Remove-NetIPAddress -InterfaceAlias "vEthernet(<name of vSwitch>)" -IPAddress <IPAddress>

Removing Multiple NATs

We have seen reports of multiple NAT networks created inadvertently. This is due to a bug in recent builds (including Windows Server 2016 Technical Preview 5 and Windows 10 Insider Preview builds). If you see multiple NAT networks, after running docker network ls or Get-ContainerNetwork, please perform the following from an elevated PowerShell:

$KeyPath = "HKLM:\SYSTEM\CurrentControlSet\Services\vmsmp\parameters\SwitchList"
 $keys = get-childitem $KeyPath
 foreach($key in $keys)
 {
 if ($key.GetValue("FriendlyName") -eq 'nat')
 {
 $newKeyPath = $KeyPath+"\"+$key.PSChildName
 Remove-Item -Path $newKeyPath -Recurse
 }
 }

remove-netnat -Confirm:$false

Get-ContainerNetwork | Remove-ContainerNetwork

Restart the Computer

~ Jason Messer

↧

Which Linux Integration Services should I use in my Linux VMs?

July 12, 2016, 9:19 am

≫ Next: Linux Integration Services download Version 4.1.2

≪ Previous: Windows NAT (WinNAT) — Capabilities and limitations

Overview
If you run Linux guest VMs on Hyper-V, you may wonder about how to get the “best” Linux Integration Services (LIS) for your Linux distribution and usage scenario. Getting the “best” is a bit nuanced, so this blog post gives a detailed explanation to enable you to make the right choice for your situation.

Microsoft has two separate tracks for delivering LIS. It’s important to understand that the tracks are separate, and don’t overlap with each other. You have to decide which track works best for you.

“Built-in” LIS
One track is through the Linux distro vendors, such as Red Hat, SUSE, Oracle, Canonical, and the Debian community. Developers from Microsoft and the Linux community at large submit LIS updates to the Linux Kernel Mailing List, and get code review feedback from the Linux community. When the feedback process completes, the changes are incorporated into the upstream Linux kernel as maintained by Linus Torvalds and the Linux community “maintainers”.

After acceptance Microsoft works with the distro vendors to backport those changes into whatever Linux kernel the Linux distro vendors are shipping. The distro vendors take the changes, then build, test, and ultimately ship LIS as part of their release. Microsoft gets early versions of the releases, and we test as well and give feedback to the distro vendor. Ultimately we converge at a point where we’re both happy with the release. We do this with Red Hat, SUSE, Canonical, Oracle, etc. and so this process covers RHEL, CentOS, SLES, Oracle Linux, and Ubuntu. Microsoft also works with the Debian community to accomplish the same thing.

This track is what our documentation refers to as “built-in”. You get LIS from the distro vendor as part of the distro release. And if you upgrade from CentOS 7.0 to 7.1, you’ll get updated LIS with the 7.1 update, just like any other Linux kernel updates. Same from 7.1 to 7.2. This track is the easiest track, because you don’t do anything special or extra for LIS – it’s just part of the distro release. It’s important to note that we don’t assign a version number to the LIS that you get this way. The specific set of LIS changes that you get depends on exactly when the distro vendor pulled the latest updates from the upstream Linux kernel, what they were able to include (they often don’t include every change due to the risk of destabilizing), and various other factors. The tradeoff with the “built-in” approach is that you won’t always have the “latest and greatest” LIS code because each distro release is a snapshot in time. You can upgrade to a later distro version, and, for example, CentOS 7.2 will be a later snapshot than CentOS 7.1. But there are inherent delays in the process. Distro vendors have freeze dates well in advance of a release so they can test and stabilize. And, CentOS, in particular, depends on the equivalent RHEL release.

End customer support for “built-in” LIS is via your Linux distro vendor under the terms of the support agreement you have with that vendor. Microsoft customer support will also engage under the terms of your support agreement for Hyper-V. In either case, fixing an actual bug in the LIS code will likely be done jointly by Microsoft and the distro vendor. Delivery of such updated code will come via your distro vendor’s normal update processes.

Microsoft LIS Package
The other track is the Microsoft-provided LIS package, which is available for RHEL, CentOS, and the Red Hat Compatible Kernel in Oracle Linux. LIS is still undergoing a moderate rate of change as we make performance improvements, handle new things in Azure, and support the Windows Server 2016 release with a new version of Hyper-V. As an alternative to the “built-in” LIS described above, Microsoft provides an LIS package that is the “latest and greatest” code changes. We provide this package backported to a variety of older RHEL and CentOS distro versions so that customers who don’t stay up-to-date with the latest version from a distro vendor can still get LIS performance improvements, bug fixes, etc. And without the need to work through the distro vendor, the Microsoft package has shorter process delays and can be more “up-to-date”. But note that over time, everything in the Microsoft LIS package shows up in a distro release as part of the “built-in” LIS. The Microsoft package exists only to reduce the time delay, and to provide LIS improvements to older distro versions without having to upgrade the distro version.

The Microsoft-provided LIS packages are assigned version numbers. That’s the LIS 4.0, 4.1 (and the older 3.5) that you see in the version grids in the documentation, with a link to the place you can download it. Make sure you get the latest version, and ensure that it is applicable to the version of RHEL/CentOS that you are running, per the grids.

The tradeoff with the Microsoft LIS package is that we have to build it for specific Linux kernel versions. When you update a CentOS 7.0 to 7.1, or 7.1 to 7.2, you get changes to the kernel from CentOS update repos. But you don’t get the Microsoft LIS package updates because they are separate. You have to do a separate upgrade of the Microsoft LIS package. If you do the CentOS update, but not the Microsoft LIS package update, you may get a binary mismatch in the Linux kernel, and in the worst case, you won’t be able to boot. The result is that you have extra update steps if you use the Microsoft provided LIS package. Also, if you are using a RHEL release with support through a Red Hat subscription, the Microsoft LIS package constitutes “uncertified drivers” from Red Hat’s standpoint. Your support services under a Red Hat subscription are governed by Red Hat’s “uncertified drivers” statement here: Red Hat Knowledgebase 1067.

Microsoft provides end customer support for the latest version of the Microsoft provided LIS package, under the terms of your support agreement for Hyper-V. If you are running other than the latest version of the LIS package, we’ll probably ask you to upgrade to the latest and see if the problem still occurs. Because LIS is mostly Linux drivers that run in the Linux kernel, any fixes the Microsoft provides will likely be as a new version of the Microsoft LIS package, rather than as a “hotfix” to an existing version.

Bottom-line
In most cases, using the built-in drivers that come with your Linux distro release is the best approach, particularly if you are staying up-to-date with the latest minor version releases. You should use the Microsoft provided LIS package only if you need to run an older distro version that isn’t being updated by the distro vendor. You can also run the Microsoft LIS package if you want to be running the latest-and-greatest LIS code to get the best performance, or if you need new functionality that hasn’t yet flowed into a released distro version. Also, in some cases, when debugging an LIS problem, we might ask you to try the Microsoft LIS package in order to see if a problem is already fixed in code that is later than what is “built-in” to your distro version.

Here’s a tabular view of the two approaches, and the tradeoffs:

Feature/Aspect	“Built-in” LIS	Microsoft LIS package
Version Number	No version number assigned. Don’t try to compare with the “4.0”, “4.1”, etc. version numbers assigned to the Microsoft LIS package	LIS 4.0, 4.1, etc.
How up to date?	Snapshot as of the code deadline for the distro version	Most up-to-date because released directly by Microsoft
Update process	Automatically updated as part of the distro update process	Requires a separate step to update the Microsoft LIS package. Bad things can happen if you don’t do this extra step.
Can get latest LIS updates for older distro versions?	No. Only path forward is to upgrade to the latest minor version of the distro (6.8, or 7.2, for CentOS)	Yes. Available for a wide range of RHEL/CentOS versions back to RHEL/CentOS 5.2. See this documentation for details on functionality and limitations for older RHEL/CentOS versions.
Meets distro vendor criteria for support?	Yes	No, for RHEL. Considered “uncertified drivers” by Red Hat. Not an issue for CentOS, which has community support.
End customer support process	Via your distro vendor, or via Microsoft support. LIS fixes delivered by distro vendor normal update processes.	Via Microsoft support per your Hyper-V support agreement. Fixes delivered as a new version of the Microsoft LIS package.

↧

Linux Integration Services download Version 4.1.2

August 10, 2016, 10:04 am

≫ Next: Waiting for VMs to restart in a complex configuration script with PowerShell Direct

≪ Previous: Which Linux Integration Services should I use in my Linux VMs?

We are pleased to announce the availability of Linux Integration Services (LIS) 4.1.2. This point release of the LIS download expands supported releases to Red Hat Enterprise Linux, CentOS, and Oracle Linux with Red Hat Compatible Kernel 6.8. This release also includes upstream bug fixes and performance improvements not included in previous LIS downloads.

See the separate PDF file “Linux Integration Services v4-1c.pdf” for more information.

The LIS download is an optional way to get Linux Integration Services updates for certain versions of Linux. To determine if you want to download LIS refer to the blog post “Which Linux Integration Services should I use in my Linux VMs?”

Download Location

The Linux Integration Services download is available either as a disk image (ISO) or gzipped tar file. The disk image can be attached to a virtual machine, or the tar file can upload and expanded to install these kernel modules. Refer to the instruction PDF available separately from the download named “Linux Integration Services v4-1c.pdf”

https://www.microsoft.com/en-us/download/details.aspx?id=51612

Linux Integration Services documentation

Source Code
Linux Integration Services code is open source released under the GNU Public License version 2 (GPLv2) and is freely available at the LIS GitHub project here: https://github.com/LIS and in the upstream Linux kernel: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/

↧

Waiting for VMs to restart in a complex configuration script with PowerShell Direct

October 11, 2016, 12:52 pm

≫ Next: General Availability of Windows Server and Hyper-V Containers in Windows Server 2016

≪ Previous: Linux Integration Services download Version 4.1.2

Have you ever tried to automate the setup of a complex environment including the base OS, AD, SQL, Hyper-V and other components?

For my demo at Ignite 2016 I did just that. I would like to share a few things I learned while writing a single PowerShell script that builds the demo environment from scratch. The script heavily uses PowerShell Direct and just requires the installation sources put into specific folders.

In this blog post I’d like to provide solutions for two challenges that I came across:

Determining when a virtual machine is ready for customization using PowerShell Direct, and – as a variation of that theme –
Determining when Active Directory is fully up and running in a fully virtualized PoC/demo environment.

Solution #1 Determining when a virtual machine is ready for customization using PowerShell Direct

Some guest OS operations require multiple restarts. If you’re using a simple approach to automate everything from a single script and check for the guest OS to be ready, things might go wrong. For example, with a naïve PowerShell Direct call using Invoke-Command, the script might resume while the virtual machine is restarting multiple times to finish up role installation. This can lead to unpredictable behavior and break scripts.

One solution is using a wrapper function like this:

This wrapper function first makes sure that the virtual machine is running, if not, the VM is started. If the heartbeat integration component is enabled for the VM, it will also wait for a proper heartbeat status – this resolves the multiple-reboot issue mentioned above. Afterwards, it waits for a proper PowerShell Direct connection. Both wait operations have time-outs to make sure script execution is not blocked perpetually. Finally, the provided script block is run passing through arguments.

Solution #2 Determining when Active Directory is fully up and running

Whenever a Domain Controller is restarted, it takes some time until the full AD functionality is available. If you use a VMConnect session to look at the machine during this time, you will see the status message “Applying Computer Settings”. Even with the Invoke-CommandWithPSDirect wrapper function above, I noticed some calls, like creating a new user or group, will fail during this time.

In my script, I am therefore waiting for AD to be ready before continuing:

This function leverages the Invoke-CommandWithPSDirect function to ensure the VM is up and running. To make sure that Active Directory works properly, it then requests the local computer’s AD object until this call succeeds.

Using these two functions has saved me quite some headache. For additional tips, you can also take a look at Ben’s tips around variables and functions.

Cheers,

Lars

PS: The full script for building the guarded fabric demo environment for Ignite 2016’s session BRK3124: Dive into Shielded VMs with Windows Server 2016 Hyper-V will be shared through our Virtualization Documentation GitHub.

↧

General Availability of Windows Server and Hyper-V Containers in Windows Server 2016

October 12, 2016, 5:54 pm

≫ Next: Use Docker Compose and Service Discovery on Windows to scale-out your multi-service container application

≪ Previous: Waiting for VMs to restart in a complex configuration script with PowerShell Direct

The general availability of Windows Server 2016 marks a major milestone in our journey to bring world class container technologies to Windows customers. From the first time, we showcased this technology at //build in 2015, through the first public preview with Technical Preview 3 onto today with general availability, our team has been hard at work creating and refining this technology. So today we are excited that it is available for you to use in production.

With each preview release we have tried hard to continue improving these technologies and today is no exception. We have increased the performance and density of our containers, lowered their start-up times and even added Active Directory support. For example, with Hyper-V Containers, we are now taking advantage of new cloning technology designed specifically to reduce start-up times and increase density. We heard your feedback, and we are excited to be expanding cross-SKU support such that you can run Windows Server Core containers using our Hyper-V Container technology including on Windows 10 Anniversary Update as well as Windows Server Containers with Nano Server on a Windows Server 2016 host installed with Server Core or Desktop. These are just a few of the enhancements we are excited to be bringing you with Windows Server 2016, our documentation site has for more information on features as well as guides to get you started.

Along with this release, in partnership with Docker Inc. the CS Docker Engine will also be available to Windows Server 2016 customers. This provides, at no additional cost, users of Windows Server 2016 enterprise support for Windows containers and Docker. Please read more about this announcement on the Hybrid Cloud blog.

In the coming days, we will also be releasing a OneGet provider which will simplify the experience of installing and setting up the Containers feature including the CS Docker Engine on Windows Server 2016 machines. Please stay tuned for more from our team and also remember to give us feedback on your experience in our forums, or head over to our UserVoice page if you have any feature requests.

Ender Barillas
Program Manager and Release Captain for Windows Server and Hyper-V Containers

↧