Common
Failures
Common failures and patterns that
cause live-migrations to be unsuccessful are the following:
- vMotion network connectivity issues
- ESXi hosts cannot ping or timeout after 20 seconds.
- MTU mismatch between the VMkernel interface and the network layer (i.e. switches and routers)
- Misconfiguration on what networks are vMotion enabled on the ESXi hosts
- Storage
- Datastore is unreachable or experiencing an APD (All Paths Down)
- I/O’s timeout of 20 seconds or more
- vMotion successful, but guest issues occur
- The VM network is not reachable – No layer-2 connectivity on the destination ESXi host
- Resources are overcommitted
- Cannot allocate memory for a long time
- Swapping takes a long time leading to a vMotion timeout
Verify that you have met all the vMotion requirements. That includes the primary
reason for vMotion issues which is network connectivity or MTU mismatches. If
you are unable to pinpoint the issue, it might be useful to look into the vSphere logs to see what happened.
vSphere
Logs
Digging deeper into the vSphere logs
can be useful to verify what is causing the live-migration to fail. The vMotion
process uses both vCenter Server and ESXi components. There are five processes
involved that log into five different log files.
The vCenter Daemon (VPXD), vCenter
Agent (VPXA) and Host Daemon (hostd) are the control plane for the vMotion
process. The ‘data plane’ is in the VMX process and the VMkernel. The first
step is to find the Operation ID (opID) in these logs. That opID maps to a
Migration ID. Using these two identifiers, we can follow a vMotion process
looking at the corresponding log files.
Operation
ID
Find the “Operation ID” of the
vMotion process in the VPXD logs on the vCenter Server instance. You can
connect to the console of the VCSA (vCenter Server Appliance) instance and go
to the bash shell. You can issue command:
grep "relocate" /var/log/vmware/vpxd/vpxd-*.log
| grep BEGIN
The output of this commands exposes
a log entry that contains the Operation ID:
/var/log/vmware/vpxd/vpxd-214.log:2019-09-24T15:38:10.042Z
info vpxd[29243] [Originator@6876 sub=vpxLro opID=jzlgfw8g-11824-auto-94h-h5:70003561-20]
[VpxLRO] — BEGIN task-8031 — vm-269 — vim.VirtualMachine.relocate —
52b17681-35d1-998b-da39-de07b7925dda(520681db-25b7-c5d6-44d7-6a056620e043)
Migration
ID
Now you retrieved the Operation ID
(opID), you can use the opID to find the Migration ID in the hostd log file on
the ESXi host. It helps when you know the source and destination ESXi hosts
that are involved. This example is a compute and storage vMotion
(XvMotion) using the Operation ID that we found using the previous command.
grep jzlgfw8g-11824-auto-94h-h5:70003561-20
/var/log/hostd.log | grep -i migrate
Source ESXi host output:
2019-09-24T15:38:47.070Z info
hostd[2100388] [Originator@6876 sub=Vcsvc.VMotionSrc.3117907752192811422
opID=jzlgfw8g-11824-auto-94h-h5:70003561-20-01-10-e036 user=vpxuser:VSPHERE.LOCAL\Administrator]
VMotionEntry: migrateType = 1
2019-09-24T15:38:47.072Z info hostd[2100388] [Originator@6876
sub=Vmsvc.vm:/vmfs/volumes/b86202d8-fb958817-0000-000000000000/NH-DC-02/NH-DC-02.vmx
opID=jzlgfw8g-11824-auto-94h-h5:70003561-20-01-10-e036
user=vpxuser:VSPHERE.LOCAL\Administrator] VigorMigrateNotifyCb:: hostlog state
changed from success to none
Destination ESXi host output:
2019-09-24T15:38:47.136Z info
hostd[2099828] [Originator@6876 sub=Vcsvc.VMotionDst.3117907752192811422
opID=jzlgfw8g-11824-auto-94h-h5:70003561-20-01-2c-3c35
user=vpxuser:VSPHERE.LOCAL\Administrator] VMotionEntry: migrateType = 1
VMkernel
entries
Now that we have the Migration ID,
we can use that to extract information about the vMotion process for this
specific live-migration in the vmkernel log files. Use the following command on
both the source and destination ESXi host:
grep VMotion /var/log/vmkernel*
The output reveals several
interesting data points on the vMotion process like the average bandwidth used
for transmitting data.
Source ESXi host output:
2019-09-24T15:38:47.402Z
cpu13:46101760)VMotionUtil: 5199: 3117907752192811422 S: Stream
connection 1 added.
2019-09-24T15:38:47.445Z cpu8:46092538)XVMotion: 2062: Allocating pool 0.
2019-09-24T15:38:47.445Z cpu8:46092538)XVMotion: 642: Bitmap page: len = 16384, pgLen = 1, bitSet = 3001, bitClear = 13383.
2019-09-24T15:38:47.445Z cpu8:46092538)XVMotion: 642: Bitmap block: len = 131072, pgLen = 4, bitSet = 131072, bitClear = 0.
2019-09-24T15:38:47.445Z cpu8:46092538)VMotion: 8134: 3117907752192811422 S: Requiring explicit resume handshake.
2019-09-24T15:38:47.445Z cpu13:46101760)XVMotion: 3384: 3117907752192811422 S: Starting XVMotion stream.
2019-09-24T15:39:34.622Z cpu14:46101762)VMotion: 5426: 3117907752192811422 S: Disk copy complete, no bandwidth estimate.
2019-09-24T15:39:34.905Z cpu24:46092539)VMotion: 5281: 3117907752192811422 S: Stopping pre-copy: only 37 pages left to send, which can be sent within the switchover time goal of 0.500 seconds (network bandwidth ~185.134 MB/s, 1536000% t2d)
2019-09-24T15:39:34.965Z cpu13:46101760)VMotionSend: 5095: 3117907752192811422 S: Sent all modified pages to destination (network bandwidth ~281.100 MB/s)
2019-09-24T15:39:35.140Z cpu15:46101757)XVMotion: 642: Bitmap page: len = 16384, pgLen = 1, bitSet = 3001, bitClear = 13383.
2019-09-24T15:39:35.140Z cpu15:46101757)XVMotion: 642: Bitmap block: len = 131072, pgLen = 4, bitSet = 131072, bitClear = 0.
2019-09-24T15:38:47.445Z cpu8:46092538)XVMotion: 2062: Allocating pool 0.
2019-09-24T15:38:47.445Z cpu8:46092538)XVMotion: 642: Bitmap page: len = 16384, pgLen = 1, bitSet = 3001, bitClear = 13383.
2019-09-24T15:38:47.445Z cpu8:46092538)XVMotion: 642: Bitmap block: len = 131072, pgLen = 4, bitSet = 131072, bitClear = 0.
2019-09-24T15:38:47.445Z cpu8:46092538)VMotion: 8134: 3117907752192811422 S: Requiring explicit resume handshake.
2019-09-24T15:38:47.445Z cpu13:46101760)XVMotion: 3384: 3117907752192811422 S: Starting XVMotion stream.
2019-09-24T15:39:34.622Z cpu14:46101762)VMotion: 5426: 3117907752192811422 S: Disk copy complete, no bandwidth estimate.
2019-09-24T15:39:34.905Z cpu24:46092539)VMotion: 5281: 3117907752192811422 S: Stopping pre-copy: only 37 pages left to send, which can be sent within the switchover time goal of 0.500 seconds (network bandwidth ~185.134 MB/s, 1536000% t2d)
2019-09-24T15:39:34.965Z cpu13:46101760)VMotionSend: 5095: 3117907752192811422 S: Sent all modified pages to destination (network bandwidth ~281.100 MB/s)
2019-09-24T15:39:35.140Z cpu15:46101757)XVMotion: 642: Bitmap page: len = 16384, pgLen = 1, bitSet = 3001, bitClear = 13383.
2019-09-24T15:39:35.140Z cpu15:46101757)XVMotion: 642: Bitmap block: len = 131072, pgLen = 4, bitSet = 131072, bitClear = 0.
Destination ESXi host output:
2019-09-24T15:38:47.397Z
cpu15:2099283)VMotionUtil: 5199: 3117907752192811422 D: Stream connection 1
added.
2019-09-24T15:38:47.440Z cpu3:3543984)XVMotion: 2062: Allocating pool 0.
2019-09-24T15:38:47.440Z cpu3:3543984)XVMotion: 642: Bitmap page: len = 16384, pgLen = 1, bitSet = 3001, bitClear = 13383.
2019-09-24T15:38:47.440Z cpu3:3543984)XVMotion: 642: Bitmap block: len = 131072, pgLen = 4, bitSet = 131072, bitClear = 0.
2019-09-24T15:38:47.440Z cpu3:3543984)VMotion: 8134: 3117907752192811422 D: Requiring explicit resume handshake.
2019-09-24T15:39:34.907Z cpu3:3543984)VMotionRecv: 761: 3117907752192811422 D: Estimated network bandwidth 205.138 MB/s during pre-copy
2019-09-24T15:39:34.960Z cpu3:3543984)VMotionRecv: 2961: 3117907752192811422 D: DONE paging in
2019-09-24T15:39:34.960Z cpu3:3543984)VMotionRecv: 2969: 3117907752192811422 D: Estimated network bandwidth 200.096 MB/s during page-in
2019-09-24T15:39:35.059Z cpu6:3543972)VMotion: 6675: 3117907752192811422 D: Received all changed pages.
2019-09-24T15:39:35.067Z cpu1:3543972)VMotion: 6454: 3117907752192811422 D: Resume handshake successful
2019-09-24T15:39:35.082Z cpu6:3543980)XVMotion: 642: Bitmap page: len = 16384, pgLen = 1, bitSet = 3001, bitClear = 13383.
2019-09-24T15:39:35.082Z cpu6:3543980)XVMotion: 642: Bitmap block: len = 131072, pgLen = 4, bitSet = 131072, bitClear = 0.
2019-09-24T15:38:47.440Z cpu3:3543984)XVMotion: 2062: Allocating pool 0.
2019-09-24T15:38:47.440Z cpu3:3543984)XVMotion: 642: Bitmap page: len = 16384, pgLen = 1, bitSet = 3001, bitClear = 13383.
2019-09-24T15:38:47.440Z cpu3:3543984)XVMotion: 642: Bitmap block: len = 131072, pgLen = 4, bitSet = 131072, bitClear = 0.
2019-09-24T15:38:47.440Z cpu3:3543984)VMotion: 8134: 3117907752192811422 D: Requiring explicit resume handshake.
2019-09-24T15:39:34.907Z cpu3:3543984)VMotionRecv: 761: 3117907752192811422 D: Estimated network bandwidth 205.138 MB/s during pre-copy
2019-09-24T15:39:34.960Z cpu3:3543984)VMotionRecv: 2961: 3117907752192811422 D: DONE paging in
2019-09-24T15:39:34.960Z cpu3:3543984)VMotionRecv: 2969: 3117907752192811422 D: Estimated network bandwidth 200.096 MB/s during page-in
2019-09-24T15:39:35.059Z cpu6:3543972)VMotion: 6675: 3117907752192811422 D: Received all changed pages.
2019-09-24T15:39:35.067Z cpu1:3543972)VMotion: 6454: 3117907752192811422 D: Resume handshake successful
2019-09-24T15:39:35.082Z cpu6:3543980)XVMotion: 642: Bitmap page: len = 16384, pgLen = 1, bitSet = 3001, bitClear = 13383.
2019-09-24T15:39:35.082Z cpu6:3543980)XVMotion: 642: Bitmap block: len = 131072, pgLen = 4, bitSet = 131072, bitClear = 0.
Virtual
Machine log file entries
The virtual machine log file resides
in the virtual machine home folder that also includes the vmx file and the vmdk
files. Using root access on your ESXi host, you can go to appropriate folder.
My test VM resides on vSAN, so I would look for the virtual machine home
directory using symlink /vmfs/volumes/vsanDatastore. Using the following
command shows even more information about the live-migration like the source
and destination IP addresses:
grep "3117907752192811422"
vmware.log
Output:
2019-09-24T15:38:47.280Z| vmx| I125:
Received migrate ‘from’ request for mid id 3117907752192811422, src ip
<192.168.200.93>.
2019-09-24T15:38:47.280Z| vmx| I125: MigrateSetInfo: state=8 srcIp=<192.168.200.93> dstIp=<192.168.200.91> mid=3117907752192811422 uuid=4c4c4544-004a-4c10-8044-c7c04f4d4e32 priority=high
2019-09-24T15:38:47.282Z| vmx| I125: MigID: 3117907752192811422
2019-09-24T15:39:34.971Z| vmx| I125: Migrate_Open: Restoring from <192.168.200.93> with migration id 3117907752192811422
2019-09-24T15:39:35.023Z| vmx| I125: Migrate_Open: Restoring from <192.168.200.93> with migration id 3117907752192811422
2019-09-24T15:38:47.280Z| vmx| I125: MigrateSetInfo: state=8 srcIp=<192.168.200.93> dstIp=<192.168.200.91> mid=3117907752192811422 uuid=4c4c4544-004a-4c10-8044-c7c04f4d4e32 priority=high
2019-09-24T15:38:47.282Z| vmx| I125: MigID: 3117907752192811422
2019-09-24T15:39:34.971Z| vmx| I125: Migrate_Open: Restoring from <192.168.200.93> with migration id 3117907752192811422
2019-09-24T15:39:35.023Z| vmx| I125: Migrate_Open: Restoring from <192.168.200.93> with migration id 3117907752192811422
This blog post is to show what log
entries are used, even though the exemplary vMotion operation was successful.
The shown log entries include a lot of detailed information that take you
through the process. If something fails, these logs will be a good resource to
look for a possible cause.
Thanks for https://blogs.vmware.com
Hi, Amazing your article you know I'm too lazy to sign up an account just for comment your article. it's really good and helping dude. thanks! Vmware Training Course in Delhi
ReplyDeleteWow this is amazing blog and i am very happy to read your blog. If you want to know about vmware active-active data center design then visit our site for more details.
ReplyDeleteyou have written an excellent blog.. keep sharing your knowledge...
ReplyDeleteVMware Training in Chennai
VMware Training in Bangalore
ReplyDeleteAwesome blog. Thanks for sharing such a worthy information....
Ethical Hacking Course in Bangalore
Ethical Hacking Course in Pune
Ethical Hacking Course in Hyderabad
Ethical Hacking Course in Delhi
Ethical Hacking Course in Trivandrum
Ethical Hacking Course in Cochin
Ethical Hacking Course in Ahmedabad
Ethical Hacking Course in Kolkata
Ethical Hacking Course in Mumbai
Best Free Virtual Machine Software for Windows and Mac
ReplyDeleteVMWare vSphere
ReplyDeleteStay ahead of cyber threats with EC Council Certified Ethical Hacker training in Hyderabad, offered by Indian Cyber Security Solutions. Learn to identify vulnerabilities, conduct ethical hacking assessments, and implement robust security measures.Enroll now to become a certified ethical hacker.
ReplyDeleteGreat breakdown of common vMotion failures! For anyone serious about mastering these technical challenges, I highly recommend the Ethical Hacking Training in Hyderabad by Indian Cyber Security Solutions. It's an excellent way to enhance your cybersecurity skills. Check it out here: Ethical Hacking Training in Hyderabad.
ReplyDeletehttps://indiancybersecuritysolutions.com/ethical-hacking-training-in-hyderabad/
Great insights on common failures during vMotion live migrations! It’s crucial to ensure all network configurations and resources are properly set up. If you're interested in deepening your knowledge in network security and preventing such issues, consider the Ethical Hacking Training in Hyderabad by Indian Cyber Security Solutions. This course equips you with the skills to secure networks effectively.
ReplyDeleteGreat insights into common vMotion failures! For those looking to enhance their cybersecurity skills, consider the Ethical Hacking Training in Hyderabad by Indian Cyber Security Solutions. It's a comprehensive program that covers critical aspects of ethical hacking. Learn more here: Ethical Hacking Training in Hyderabad.
ReplyDeletehttps://indiancybersecuritysolutions.com/ethical-hacking-training-in-hyderabad/