You can find the explanation of Algorithm 4.3.1D, as it appears in the book Art of The Computer Programming Vol. 2 (pages 272-273) by D. Knuth in the appendix of this question.
It appears that, in the step D.6, qhat is expected to be off by one at most.
Lets assume base is 2^32 (i.e we are working with unsigned 32 bit digits). Let u = [238157824, 2354839552, 2143027200, 0] and v = [3321757696, 2254962688]. Expected output of this division is 4081766756 Link
Both u and v is already normalized as described in D.1(v[1] > b / 2 and u is zero padded).
First iteration of the loop D.3 through D.7 is no-op because qhat = floor((0 * b + 2143027200) / (2254962688)) = 0 in the first iteration.
In the second iteration of the loop, qhat = floor((2143027200 * b + 2354839552) / (2254962688)) = 4081766758 Link.
We don't need to calculate steps D.4 and D.5 to see why this is a problem. Since qhat will be decreased by one in D.6, result of the algorithm will come out as 4081766758 - 1 = 4081766757, however, result should be 4081766756 Link.
Am I right to think that there is a bug in the algorithm, or is there a fallacy in my reasoning?


