Home Lab Power Automation – Part 2

In part 1, I shared some of the tools I’d use to execute the power on and shutdown tasks in my lab. Today, let’s have a look at my startup PowerCLI script.

A Test-Connection Cmdlet Replacement

As I started working on the scripts, I needed a way to determine if hosts and devices were accessible on the network. Unfortunately, the Test-Connection cmdlet was not available in the Linux PowerShell core release. It uses the Windows network stack to do its thing, so it may be a while before an equivalent gets ported to Linux. As an alternative, I created a simple python script that achieves the same overall result called pinghost.py. You can find more detail on how it works in a post I did a few months back here.

The script is very straightforward. You specify up to three space separated IP addresses or host names as command line arguments, and the script will send one ICMP echo request to each of the hosts. Depending on the response, it will output either ‘is responding’ or ‘is not responding’. Below is an example:

pi@raspberrypi:~/scripts $ python pinghost.py vc.lab.local 172.16.10.67 172.16.10.20
vc.lab.local is not responding
172.16.10.67 is responding
172.16.10.20 is not responding

Then using this script, I could create sleep loops in PowerShell to wait for one or more devices to become responsive before proceeding.

Adding Timestamps to Script Output

As I created the scripts, I wanted to record the date/time of each event and output displayed. In a sense, I wanted it to look like a log that could be written to a file and referred to later if needed. To do this, I found a simple PowerShell filter that could be piped to each command I ran:

#PowerShell filter to add date/timestamps
filter timestamp {"$(Get-Date -Format G): $_"}

Step 1 – Power On the Switch

Powering up the switch requires the use of the tplink_smartplug.py python script that I discussed in part 1. The general idea here is to instruct the smart plug to set its relay to a state of ‘1’. This brings the switch to life. I then get into a ‘do sleep’ loop in PowerCLI until the Raspberry Pi is able to ping the management interface of the switch. More specifically, it will wait until the pinghost.py script returns a string of “is responding”. If that string isn’t received, it’ll wait two seconds, and then try again.

"Powering up the 10G switch ..." |timestamp
/home/pi/scripts/tplink-smartplug-master/tplink_smartplug.py -t 192.168.1.199 -c on |timestamp

"Waiting for 10G switch to boot ..." |timestamp
do
{
$pingresult = python ~/scripts/pinghost.py 172.16.1.1 |timestamp
$pingresult
sleep 2
} until($pingresult -like '*is responding*')

When run, the output looks similar to the following:

2018-08-23 1:21:49 PM: Powering up the 10G switch ...
2018-08-23 1:21:50 PM: Sent: {"system":{"set_relay_state":{"state":1}}}
2018-08-23 1:21:50 PM: Received: {"system":{"set_relay_state":{"err_code":0}}}
2018-08-23 1:21:50 PM: Waiting for 10G switch to boot ...
2018-08-23 1:22:00 PM: 172.16.1.1 is not responding
2018-08-23 1:22:05 PM: 172.16.1.1 is not responding
2018-08-23 1:22:11 PM: 172.16.1.1 is not responding
<snip>
2018-08-23 1:22:49 PM: 172.16.1.1 is not responding
2018-08-23 1:22:53 PM: 172.16.1.1 is responding

Step 2 – Power on Shared Storage

Next, I need to get shared storage online. This involves powering up my Dell T110 FreeNAS server and then waiting for it to come online before proceeding. Since the T110 has IPMI support, I use ipmitool to issue a ‘power on’ request. The script will then get into another do-until loop, waiting for pinghost.py to return a string of “is responding”.

"Powering up the freenas box via IPMI ..." |timestamp
ipmitool -I lanplus -H 172.16.1.67 -U root -P "VMware1!" power on |timestamp

"Waiting for freenas to boot ..." |timestamp
do
{
$pingresult = python ~/scripts/pinghost.py 172.16.1.17 |timestamp
$pingresult
sleep 5
} until($pingresult -like '*is responding*')

The output when run looks like the following:

2018-08-23 1:23:18 PM: Powering up the freenas box via IPMI ...
2018-08-23 1:23:18 PM: Chassis Power Control: Up/On
2018-08-23 1:23:18 PM: Waiting for freenas to boot ...
2018-08-23 1:23:21 PM: 172.16.1.17 is not responding
2018-08-23 1:23:30 PM: 172.16.1.17 is not responding
2018-08-23 1:23:38 PM: 172.16.1.17 is not responding
2018-08-23 1:23:47 PM: 172.16.1.17 is not responding
<snip>
2018-08-23 1:27:36 PM: 172.16.1.17 is not responding
2018-08-23 1:27:41 PM: 172.16.1.17 is responding

The FreeNAS box isn’t very quick to boot and is one of the longest steps in the process at about 4-5 minutes.

Step 3 – Power on Management ESXi Host

The management ESXi host based on the Intel S2600CP platform also supports IPMI. I use very similar scripting to power it up as I did with FreeNAS. It’s important that I boot the management host fully before any of the other hosts as there are important VMs that support the infrastructure here (like AD, DNS, vCenter etc). All of these VMs are configured to auto-start when the ESXi host boots.

"Powering up esx-m1 management host ..." |timestamp
ipmitool -I lanplus -H 172.16.1.60 -U ADMIN -P "ADMIN" power on |timestamp

"Waiting for esx-m1 to boot ..." |timestamp
do
{
$pingresult = python ~/scripts/pinghost.py 172.16.1.20 |timestamp
$pingresult
sleep 5
} until($pingresult -like '*is responding*')

"The esx-m1 management host is up ..." |timestamp

This host takes about two minutes to come up:

2018-08-23 1:27:46 PM: Powering up esx-m1 management host ...
2018-08-23 1:27:46 PM: Chassis Power Control: Up/On
2018-08-23 1:27:46 PM: Waiting for esx-m1 to boot ...
2018-08-23 1:27:50 PM: 172.16.1.20 is not responding
2018-08-23 1:27:58 PM: 172.16.1.20 is not responding
2018-08-23 1:28:07 PM: 172.16.1.20 is not responding
<snip>
2018-08-23 1:29:31 PM: 172.16.1.20 is not responding
2018-08-23 1:29:37 PM: 172.16.1.20 is responding
2018-08-23 1:29:42 PM: The esx-m1 management host is up ...

Step 4 – Powering up Compute ESXi hosts

At this point, the management ESXi host has already booted up, and it should be running its list of auto-start VMs. The AD and DNS VMs will start in a minute or two, but there is no reason I can’t get a head start on powering up the compute hosts now.

"Powering on all hosts in compute-a using IPMI ..." |timestamp
ipmitool -I lanplus -H 172.16.1.61 -U ADMIN -P VMware1! chassis power on |timestamp
sleep 2
ipmitool -I lanplus -H 172.16.1.62 -U ADMIN -P VMware1! chassis power on |timestamp
sleep 2
ipmitool -I lanplus -H 172.16.1.63 -U ADMIN -P VMware1! chassis power on |timestamp

Three compute nodes are powered on with a two second delay between each. This is done just to avoid a large draw in current as they all come up simultaneously. You’ll notice that I don’t wait to see if they’ve booted as I did with FreeNAS and the management host. This is because I’ll still be waiting a while for vCenter to come online. I’ll be coming back to check on these hosts in a bit.

Step 5 – Waiting for Important VMs to Boot

After doing some test runs, it became clear that vCenter Server took the longest to start up of all the auto-start VMs. At this point, I decided to give it sufficient time to at least be reachable on the network before proceeding with the script. I used the pinghost.py script again to wait for it to come up.

"Waiting for auto-start VMs to boot, including vCenter ..." |timestamp
do
{
$pingresult = python ~/scripts/pinghost.py 172.16.1.15 |timestamp
$pingresult
sleep 5
} until($pingresult -like '*is responding*')

Once vCenter could be reached, I was confident that all other important auto-start VMs were also online. In theory, I could improve this step by checking on other VMs as well.

Step 6 – Checking in on the Compute Nodes

While waiting for vCenter to come up, the compute nodes should have come up. I do another quick ping test to make sure all three are reachable before proceeding.

"Make sure compute-a hosts finished booting before we proceed ..." |timestamp
do
{
$pingresult = python ~/scripts/pinghost.py 172.16.1.21 172.16.1.22 172.16.1.23 |timestamp
$pingresult
sleep 2
} until($pingresult -NotLike '*not responding*')

As long as the pinghost.py doesn’t return a “not responding” string, then I know all three nodes are up it must have returned “is responding” for all three. Every time I’ve run through this script, the three ESXi hosts are online once vCenter comes up. None the less, it’s good to confirm.

Step 7 – Waiting for Services

This is the one step I feel is a bit wasteful. If anyone knows of a better way to determine that vCenter’s services are up and functional, please let me know.

Because we’ll be using PowerCLI commands issued to vCenter shortly, it’s important that VC be fully functional before proceeding. Based on some tests runs, I determined that about 3-4 minutes is needed before VC services stabilize. To be safe, I simply let the script wait a full five minutes before proceeding:

"vCenter is up, but let's give it 5 minutes for services to stabilize ..." |timestamp
sleep 300

Step 8 – Powering Up all VMs

The last step I need to do is to connect to vCenter using PowerCLI, and then start up all of the VMs in the compute-a cluster.

"Connecting to vCenter ..." |timestamp
Connect-VIServer -Server 172.16.1.15 -User administrator@vsphere.local -Password "VMware1!"

"Powering on all VMs in compute-a ..." |timestamp
$vmlist = Get-VM -Location compute-a | where{$_.PowerState -eq 'PoweredOff'}
foreach ($vm in $vmList)
{
Start-VM -VM $vm -Confirm:$false |Format-List -Property Name, VMHost, PowerState
}

"Disconnecting from VC ..." |timestamp
Disconnect-VIServer -Server * -Force -Confirm:$false

"Finished!" |timestamp

As you can see above, I establish a Connect-VIServer connection to VC, then get into a ‘foreach’ loop, powering on all VMs that currently have a power state of ‘PoweredOff’. Some VMs, like NSX edges will auto-start, so to avoid errors I only try to power on machines that are not already on.

Conclusion

I’ll share the shutdown process in Part 3 in a day or two.

Have some suggestions on how I could improve these scripts? Please leave a comment below or reach out to me on Twitter (@vswitchzero)

Home Lab Power Automation Series:

Part 1 – The Tools
Part 2 – Powering Everything On
Part 3 – Shutting Everything Down

One thought on “Home Lab Power Automation – Part 2”

  1. Wow Brilliant very helpful Stuff.
    So where your running this script Through Vnware Vsphere powercli?
    Then do u have any script to Update the Vmware tools?
    And multiple machines deploy ment?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s