See updates below for what I've discovered about the more general nature of this problem.
I have a Windows VM running OVMF firmware with VFIO passthrough for a graphics card running on an Arch Linux host. I am trying to enable Windows Sandbox within said VM. Microsoft documentation indicates that running Sandbox within an L1 VM should be possible, although their documentation is written for a Hyper-V hypervisor.
When I enable Windows Sandbox in my VM and reboot as part of its install process, the VM fails about half a second into the boot process. It does this twice then attempts startup repair. Here's a video that shows what happens (with apologies for quality). After this video ends Automatic Repair tells me that my PC couldn't start correctly.
If from that environment I get to the recovery command line and run dism /image:D:\ /disable-feature /featurename:containers-disposableclientvm (i.e. disable Windows Sandbox), I am able to boot into my system just fine. On my system this problem is consistently reproducible.
The docs indicate that Windows Sandbox will not be an enabled option in the Windows Features dialog if it's not supported in the machine configuration, and if I set the host kernel parameter kvm_intel.nested=0 it indeed does become grayed out.
This problem initially occurred on Windows 10 Pro version 1909, but after updating to version 2004 (build 19041.388) the problem persists.
My question, then, is: How can I get an error message instead of a messageless crash, and/or how can I cause it not to crash? I'm finding that it's hard to Google this circumstance without an error message or useful log entries, and I'm not sure where to go from here.
Things I've tried:
- Guest OS update from 1909 to 2004 as described above, to no avail.
- Removing the GPU and USB passthroughs and running with QXL + Spice graphics. This did not help.
- Copying the virtual disk onto a hard drive and booting from it on hardware directly (i.e. removing the Linux hypervisor from the equation). This works, and I am able to enable Windows Sandbox and start a sandbox VM as expected, however does not solve my problem. To me this confirms the issue has to do with an interaction with my hypervisor and not with the Windows install.
cputag config variants, all to no avail (see the libvirt domain XML file linked below):mode='host-passthrough' check='partial' migratable='on'mode='host-passthrough' check='partial' migratable='off'mode='host-model' check='partial
- Q35 machine type instead of i440FX, to no avail. Unfortunately I don't have the exact config I used on hand, but I switched the
domain/os/type[machine]attribute in the XML, changed some PCI controllers to PCIe controllers, and added some PCIe controllers with no other tweaks.
Configuration:
Host uname -a: Linux myhostnamehere 5.7.9-arch1-1 #1 SMP PREEMPT Thu, 16 Jul 2020 19:34:49 +0000 x86_64 GNU/Linux
libvirt domain config: Pastebin. I've redacted some IDs but this is otherwise verbatim.
Nothing interesting appears in host dmesg when these crashes happen.
Qemu version 5.0.0-7, libvirt version 6.5.0-1. (These are package manager version tags.)
Guest: Windows Version 2004 (OS Build 19041.450), Pro edition
Update:
(Added guest OS version in configuration section above.)
I have tried the following additional things, and I've updated question tags and title accordingly.
- On my host, tested nested virtualization on a Linux guest. I successfully ran Fedora 29 inside Ubuntu 20.04 on my Arch host, so nested virtualization itself on my hardware and L0 configuration is not a problem.
- Tried the following additional Windows features. (These are the names DISM uses.) All, upon the reboot I perform during feature installation, produced the same early-boot crash-and-reboot symptoms described above.
Microsoft-Hyper-V-AllVirtualMachinePlatformHypervisorPlatforminstalls successfully without reboot, but produces failure symptoms once I reboot.
- The
ContainersWindows feature enables without drama. - Running
bcdedit /set hypervisorlaunchtype Auto(withContainersenabled, for what it's worth) causes the same crash symptoms as above, and deleting that entry fixes it, just as with Windows features. - Running
bcdedit /set hypervisorlaunchtype Off(with any of the above problematic features enabled, either before rebooting or during reboot recovery) avoids the crash, but prevents me from using virtualization features like Windows Sandbox ("No hypervisor was found" error). - Running a separate Windows VM with simpler config, notably using BIOS firmware instead of the OVMF UEFI firmware, still jumps to automatic boot repair when I enable Windows Sandbox.
- Setting the host kernel parameter
kvm_intel.pml(which defaults to yes) to no has no effect and the problem remains. (Some posts online reference an older problem relating to this parameter, which is supposed to have been fixed. In any case this tweak didn't help me.)
Based on this, I'm confident summarizing the problem as: in spite of nested virtualization hardware and host kernel features, Windows guests on my host cannot themselves run hypervisors. My hardware and host support nested virtualization, and Windows references support for nested virtualization in Hyper-V environments. What do I need to do to get this to work, or figure out errors for why it's not, in my non-Hyper-V environment?