@inproceedings{chapman09vnuma,
author = "Matthew Chapman and Gernot Heiser",
title = "vNUMA: A Virtual Shared-Memory Multiprocessor",
booktitle = "USENIX'09",
year = 2009,
}
Notes from Zhonghua
Itanium architecture is not so frequently used today that the implementation of this design seems to be of no use in real-world computing.
In Sec 3.1, the author described the difficulties of "Diffing" with a detailed example. The problems of "Diffing" do exist and high influences the design accuracy and complexity. I think we may use a third state of "not modified" to present a byte that not have been changed. (The complexity of this third state can be further discussed).
In Sec 3.2.2, the author mentioned "For pages in write-update mode, vNUMA broadcasts writes to all nodes.". This may make vNUMA less scalable that it can only be used in small clusters.
In Sec 4.1, the author mentioned three mechanisms to avoid thrashing: 1) introduce an artificial delay to break the livelock; 2) Putting the machine into single-step mode after receipt of a page to guarantee that at least one instruction can be executed before a page is transferred; 3) Consult the performance-monitor register that counts retired instructions to determine whether progress has been made since the last page transfer.
In my opinion, I see no difference between the first 2 mechanisms. The second one is just putting the artificial delay mentioned in the first one to be exactly the running time of 1 instruction.
(12-11-2010 05:03 PM)szh Wrote: [ -> ]In Sec 4.1, the author mentioned three mechanisms to avoid thrashing: 1) introduce an artificial delay to break the livelock; 2) Putting the machine into single-step mode after receipt of a page to guarantee that at least one instruction can be executed before a page is transferred; 3) Consult the performance-monitor register that counts retired instructions to determine whether progress has been made since the last page transfer.
In my opinion, I see no difference between the first 2 mechanisms. The second one is just putting the artificial delay mentioned in the first one to be exactly the running time of 1 instruction.
One difference between artificial delay and single-instruction trap is that the latter guarantees at least one instruction gets executed (progress made) upon every page fault. Artificial delays cannot make this guarantee no matter how long it waits.