Advanced Hyper-V replication configuration

Hi,

In the last post, I presented the Capacity Planner for Hyper-V Replica and in the documentation, I discovered that this tool can suggest a value for the parallel machines number to be transferred.

Rapidly, I asked Google about this parameter and bingo! A very nice article from Microsoft that describes more parameters to configure :

  • DisableCertRevocationCheck
  • MaximumActiveTransfers
  • ApplyVHDLimit
  • ApplyVMLimit
  • ApplyChangeReplicaDiskThrottle

All these parameters can be configured through the registry.

Enjoy!

Capacity Planner for Hyper-V Replica; a long story from SCCM!

Hi Geeks,

For a customer who has about 1500 users, I have designed a SCCM 2012 Platform using a single primary site since there is no a subordinate important site (to use it as secondary site or another primary site) with the these elements :

  1. A site server on a DL 360 G7
  2. A site system server with duplicated roles on a DL 360 G7
  3. 2 SQL Servers configured used Always On feature on 2 DL 360 G7

All right for 1500 users, the proposed architecture is highly available. However, the customer has changed his opinion: The SCCM Is so critical for him ans desires to get it on the Secondary site.

My challenge was with the same servers, I had to find a solution since SCCM 2012 does not support Disaster Recovery capabilities.

So I have thought a bout virtualization to offer :

  • High availability through a Hyper-V cluster
  • Disaster Recovery capabilities through Hyper-V Replica

The architecture has changed and the following schema describes the involved elements :

SCCM Ar

  • 2 servers used as Hyper-V Cluster Nodes. Each node can host two machines : SCCM (a primary site server), SQL ( configured also as a site server with some duplicated roles)
  • 1 server as SAN (Yes!). The cluster was based on SMB 3!
  • 1 server as Hyper-V replica

Very nice! The designed architecture was deployed successfully (ElhamdouliLLah). However, I have encountered some issues with the Hyper-V replication that works fine locally but with big disruptions over the WAN.

My problem is I was not able to estimate the necessary ressources (WAN bandwidth especially) for my workload.

Fortunatly, Microsoft has released this great tool ; Capacity Planner for Hyper-V that can be downloaded from this link.

CAPLA

After configuring and running the tool, it is possible to consult a rich report that covers (from the tool documentation) :

1)      Virtual Machine:

The table lists a set of VMs and VHDs which were considered for capacity planning guidance.

2)      Processor

The table captures the estimated CPU impact on the primary and replica servers, after enabling replication on the selected VMs.

3)      Memory

The table captures the estimates memory requirements on the primary and replica servers, by enabling replication on the selected VMs

4)      IOPS

There are two tables in this section – one for the primary storage subsystem and the other for the replica storage subsystem.  The attributes for the primary storage subsystem are:

a)      Write IOPS before enabling replication – This captures the write IOPS observed across all the selected VMs for the duration of the run

b)      Estimated additional IOPS during initial replication – Once replication is enabled, the VHD is transferred to the replica server/cluster as part of the ‘Initial Replication’ (IR) operation which can be completed over the network. The IOPS required during this duration is captured in this row.

c)       Estimated additional IOPS during delta replication – Once IR completes, Hyper-V Replica attempts to send the tracked changes every 5 minutes. The additional IOPS required during this operation is captured in this row.

The attributes for the replica storage subsystem are:

a)      Estimated IOPS during IR – During the course of IR, the IOPS impacts on the replica storage subsystem is captured in this row

b)      Estimated IOPS when only the latest point is preserved – While enabling replication, customers will have an option to store only the recovery point or upto 15 additional recovery points (which are spaced at a 1 hour granularity). This row captures the IOPS impact when storing only the latest recovery point.

c)       Estimated IOPS impact when multiple recovery points are used – This row captures the IOPS impact when replication is configured to store multiple recovery points. Hyper-V recovery snapshots are used to store each recovery point. The IOPS impact is independent of the number of points.

5)      Storage

This section captures the disk space requirements on the primary and replica storage. The first table which captures the primary storage subsystem contains the following details:

a)      Additional space required on the primary storage: Hyper-V Replica tracks the changes to the virtual machine in a log file. The size of the log file is proportional to the workload “churn”. When the log file is being transferred (at the end of a replication interval) from the primary to the replica server, the next set of “writes” to the virtual machine are captured in another log file. This row captures the space required across all the ‘replicating’ VMs

b)      Total churn in 5minutes: This row captures the workload “churn” (or the writes to the VM) across all the VMs on which replication will be enabled.

The following metrics are reported on the replica storage:

a)      Estimated storage to store the initial copy: Irrespective of the replication configuration around additional points (latest vs storing more than one point), this row, captures the storage required to store the initial copy.

b)      Additional storage required on the replica server when only the latest recovery point is preserved: Over and above the storage required to store the initial copy, when replication is enabled with only the latest point, the tracked changes from the primary server are written to the replica VM directly. Storage (which is equal to the churn seen in a replication interval) is required to store the log file before writing to the replica VM.

c)       Additional storage required per recovery point on the replica server when multiple recovery points are preserved: Over and above the storage required to store the initial copy, each additional recovery point (which is stored as Hyper-V snapshot on the replica server) requires additional space which is captured in this row. This is an estimate based on the total VHD size across all the VMs and the final size is dependent on parameters such as write pattern.

6)      Network

The network parameters are captured in the table. These are:

a)      Estimated WAN bandwidth between the primary and replica site: This is the input provided to the capacity planning tool.

b)      Average network bandwidth required: Based on the workload churn observed during the duration of the run, this row captures the average network bandwidth required to meet Hyper-V Replica’s attempt at sending the tracked changes every 5 minutes. This is a rough estimate as factors (which are not accounted by this tool) such as compression of the payload, latencies in the network pipe etc could impact the results.

c)       MaximumActiveTransfers: In a multi-VM-replication scenario, if the log file for each of the replicating VM is transferred sequentially, this could starve or delay the transmission of the change log file of some other replicating VM. On the other hand, if the change log file for all the replicating VMs are transferred in parallel, it would affect the transfer time of all the VMs due to network resource contention. In either scenario, the Recovery Point Objective (RPO) of the replicating VMs is affected. An optimal value for the number of parallel transfers is got by dividing the available WAN bandwidth by the TCP throughput of your link. The tool calculates the TCP throughput by replicating the temporary VM which is created and makes a recommendation for a registry key which is taken into account by Hyper-V Replica. It is worth noting that the value captures the number of parallel network transfers and *not* the number of VMs which are enabled for replication.

A great tool really!

 

 

InMon SFlow extension for Hyper-V 3 Part 2

Hi,

In the last post, I have introduced the SFlow protocol and the solution developed by InMon for Hyper-V3.

Let us see in this post how to install it.

First of all, we have to understand the architecture of the InMon solution described by this image:

As you can see, InMon solution is a virtual switch extension. This new architecture provided by Hyper-V3 allows us to integrate special treatment on the network stack for QoS purposes for instance. We can for example integrate new protocols and so forth.

An other important element is the agent which is installed on each Hyper-V machine. its purpose is to send collected data from the virtual switch to a collector installed on another machine.

In our case, we’ll use the Sflow Trend provided freely by InMon on this link.

And new let us start:

  • Download the SFlow extension from this link.

  • Install the SFlow extension and agent on each Hyper-V3 machine.
  • On the parameters page, you have to introduce the collector machine, the polling interval and the sampling rate as SFlow is based on the sampling principle. For this reason we say that the data provided does not reflect  exactly your network status. these parameters can be changed through the registry.
  • After installing the agent, you have to select the appropriate virtual switches and select the extensions part to check the sFlow Traffic monitoring parameter.
  • Download sFlow Trend and install it on the collector server.
  • Wait for some minutes and launch the sFlow Tend console to discover the beautiful results.
  • Charts is one of the most important data views that allows you to analyze your network according to the Top connections and the according protocols.

InMon SFlow extension for Hyper-V 3 Part 1

 

 

Hi,

Today the IT infrastructures trend is to be virtual oriented. In this case, we have to monitor some typical aspects.

One of the most important elements to manage is the network health as it is the basis of the machines communication. However, our main purpose is to analyze the bandwidth used by the machines. For this reason some new protocols are defined like NetFlow and SFlow.

What about SNMP? A very interesting response can be found here :

  • SNMP can be used for real-time (i.e. every second) and although NetFlow provides beginning and end times for each flow, it isn’t nearly as real-time as SNMP. In fact, due to the active timeout issue, NetFlow really can’t provide granularity finer than 1 minute else, it sort of defeats the idea of NetFlow’s awesome aggregation. I think we are all learning about how important the active timeout is with the Cisco ASA.
  • NetFlow tells you who and with what is consuming the bandwidth, it is also much more verbose than SNMP and therefore NetFlow exports consume much more disk space for historical information
  • SNMP can be used to collect CPU and memory utilization and that just isn’t available yet using NetFlow. Notice I used the word ‘yet’. The future of NetFlow is very optimistic.

I have recently worked on a very interesting project involving Hyper-V 2 and ESXi 5.0. I have applied some best practices on the virtual machines to improve the performances like increasing the VMbus buffer size. As a switch presents some problems, I noticed a big number of dropped packets and I searched some solutions that can help me to find the origins issue.

Here I have encountered the SFlow protocol which is used to monitor and analyze the flow networks. A definition of this protocol is described by this video :

As Hyper-V3 defines an extensible structure of the VSwitch, InMon has developed SFlow extension that can be integrated n the VSwitch pipeline. In this case, we can analyze the flow networks of the machines connected to Sflow activated switches.

In the Part 2, InchaEllah, I will describe the steps to install SFlow on Hyper-V3.