Hey folks. Again, it’s been a while since I’ve posted up something here, but I found something recently that was worth sharing. A customer ran into an issue while running Horizon 7.12 and trying to do a Recompose on the Pool.
Background: The customer had installed Horizon 7.12 and created a pool of GPU enabled desktops using nVidia GRID. Unfortunately, after a 2 month deployment, he found that the virtual desktops (VDs) were experiencing lag, screen artifacts and overall slowness. The admin did some research and found some more optimal settings for the pool (not going to discuss the changes here) to “allocate all memory” for the VD pool to help with the video processing. After making changes to the base image and taking a snapshot, when the admin kicked off the Recompose on the pool, he got a very vague “An Internal Error Occurred” and that was it. No useful errors in the vCenter or Horizon console at all. Recompose on the pool was just failing.
Troubleshooting: What’s the first thing you do in this situation? Pull a log bundle on Horizon and find the problem!! Well, that’s what we did – having the customer timestamp when the Recompose failed and relay that along with the LogBundle for review. Digging through the logs, there was nothing obvious failing here. Going to my favorite “needle in a haystack” analysis style, I pulled up BareGrep and started doing targeted grepping of the log bundle for “fail”, “error”, “internal error”, etc.. Something I came up with in a Debug log was a _literally_ cryptic message (highlighted in red):
2020-10-30T15:13:49.440-05:00 TRACE (2564-1EF8) [Event] Raising windows event ([VLSI_DESKTOP_RECOMPOSE_FAILED] “domain.org\username failed to request a recompose of 99 machine(s) in desktop Graphics users no video card. Full Adobe Suite”: Node=server.domain.org, DesktopId=graphicsusers, Severity=AUDIT_FAIL, Time=Fri Oct 30 15:13:49 CDT 2020, MachinesCount=99, ViewAPIDesktopId=Desktop/Yjg3YTVlNTYtNTVhMy00YWIxLTkyOTEtMTc3YjAxMThmOTZl/Z3JhcGhpY3N1c2Vycw, DesktopDisplayName=Graphics users no video card. Full Adobe Suite, Source=com.vmware.vdi.vlsi.server.resources.DesktopViewComposerManager, UserSID=########, Module=Vlsi, UserDisplayName=#####.###\######, Acknowledged=true)
2020-10-30T15:13:49.440-05:00 ERROR (2564-2210) [RestApiServlet] Unexpected fault:(vdi.fault.EntityNotFound) {
errorMessage = BaseImageVm does not exist on VC VirtualCenter/Yjg3YTVlNTYtNTVhMy00YWIxLTkyOTEtMTc3YjAxMThmOTZl/MjY3NzJkN2QtODBkYi00OWI1LTkxMmMtMTM0MDNjZTY1OGEw,
id = (vdi.EntityId) {
dynamicType = null,
dynamicProperty = null,
id = BaseImageVm/Yjg3YTVlNTYtNTVhMy00YWIxLTkyOTEtMTc3YjAxMThmOTZl/MjY3NzJkN2QtODBkYi00OWI1LTkxMmMtMTM0MDNjZTY1OGEw/L0RhdGFjZW50ZXIvdm0vVkRJIC0gR3JhcGhpY3M/dm0tMTA5
}
} for uri /view-vlsi/rest/v1/desktop/recompose
Obviously, the internals of Horizon was hashing the name of the BaseImage, so we really couldn’t figure what it was trying to look for here (although just the error message was a clue here – keep reading). After chatting with a colleague at VMWare, it was noted that we were using the HTML5 interface for Horizon management (as you should these days) and that we might be able to get more information by doing the Recompose in the FLEX interface. Although FLEX is going away, it still has some features and reporting/feedback that is not in the HTML5 interface yet. Per my source, FLEX was deprecated in ESXi 7.0 as the HTML5 interface has been built out well enough to be the only management interface for ESXi and vCenter. However, other products HTML5 management consoles are still being developed, specifically Horizon. So, as long as your version of Horizon shipped with a FLEX console, you will have access to use that alternate console.
So, bringing up the FLEX console and walking through a Recompose function got us some additional information! Take a look:
That file reference is clue #2 of the puzzle – we now know what file the Recompose is looking for. We drilled in to vCenter (blacked out) to verify that the VM is there, and sure enough, under “VMs & Templates” it was located at \DataCenter\vSAN\VDI-Graphics. But that still didn’t look right…
On a hunch, I asked the customer to clear the error and let’s use the “Change/Browse” button on the “Parent VM” field (in the background). Once we did that, we found the problem. In the pop-up for locating the Parent VM, we were presented with “/Datacenter/vm/Parent/…” folder structure, where all the Parent VM’s were located – not “/Datacenter/vm/…”
Apparently, some process or some one, had moved the Parent VM’s down one level, consequently breaking all Pool Recompose operations unless the Parent VM field was repointed to the new location.
Once we re-pointed to the new location of the Parent VM, the Recompose process went off without a hitch!! The customer went back and checked the config of other pools and found they were affected with this issue as well. Fortunately, you can re-point in either the HTML5 or FLEX interfaces, but this error wasn’t handled well in the HTML5 console. Apparently, the HTMT5 console is still a work in progress, so when you run into Error conditions that aren’t explained well – give the FLEX console a shot!
Hope this helps.