I remember the strengths specifically applied when given static data, which can be stored on external media. "When the scene doesn't change" But if you applied transformations to the voxel cloud "aka animation"; that has to go into memory somehow yes? transformations for millions~billions of voxels with data into memory can't be cheap

The flaw in your logic, which is totally understandable given the vastly different way in which this system works as opposed to all its predecessors, is that this system doesn't apply transformations to billions of voxels, because they are never in memory. They are still on the hard drive. Their engine takes a 3D mesh inside 3D Max, for example, then writes a name for every side of the object, and then further writes another name for say (just an example) six more angles of that one side. Now every object in the scene has a complex name. When the game engine loads an object, it doesn't load the object at all, it merely loads the name of the object and uses a placeholder for the name. Now imagine your world was not filled with objects, just flat planes with names on them. The name is the reference name to the actual object on the disc but in your renderer, all the game would see is a cardboard cut out with the name on it. When the renderer says the object is being looked at by the camera (ie, your character), it determines if the object is behind other objects, fully occluded, and therefore does not need to ever turn it on since it can't be seen.) The objects in the foreground are turned on, but only the parts that are visible to your character (your monitor's view). Now all of those pixels on the screen (using the referenced cardboard cut outs) are then looked at by the engine one pixel at a time, just one, and it never loads the mesh or the texture into memory. The engine is smart enough to determine the pixel's color without ever loading them. It references (like Google) the pixel it's looking for, combines just those pixels in its rendering eye, and produces the color, and the pixel is displayed on your screen where that object would be.
More simply: it's like this: Imagine you had a dragon model. And on every portion of the dragon you wrote A, B, C, D, E, F, G, H, I, J, K .... until you have covered all the sides that can be seen with referece names. Then on each of the 16 possible angles you had further divisions (A1 thru A-8 etc) ... now the dragon is on your screen, you are looking at it. But its facing you. So only A, D, E, and G would be visible, let's say. The engine does a Google-style search for all the indexed names we'd be looking at (that are facing the camera) and turns them on. Then it determines which ones are behind the others (fully occluded), and turns them back off. Whatever is left is all the screen can see. No other object in the world behind the dragon, the mountains, the trees, none of it, can be seen, so they are all turned off where the dragon would be seen, because the dragon's pixels overlap those parts of the mountains and trees... So now that the screen is fully determined to contain A1-A7, D2-D6, E5-E6, G1, G3, and G6) ... all only a few parts of the entire screen) ... it can now begin looking at the pixels. It matches the address of the mesh with the relevant addresses of the textures ... and combines them as the rendering engine would do.... to produce the pixel at whatever color it would have. It uses this Google-style referencing system (which is very fast fast, much faster than loading models and textures into memory and manipulating them endlessly and then trying to scale down the data on screen) to search for ONLY THE PIXELS IT NEEDS times the FPS per second.... to obtain just what is seen and nothing more ... leaving all other data on the disc untouched if not needed. Thus, the entire game runs off your hard-drive with very little need for the GPU ...
That's what I gather is the secret of this tech from all that I've read and studied about it ...