Does the CS register need to be set when setting up Unreal Mode?

Question

The OSDev Wiki describes the general procedure of switching to unreal mode, with an example bootloader:

ORG 0x7c00                ; add to offsets
 
start:
   xor ax, ax             ; make it zero
   mov ds, ax             ; DS=0
   mov ss, ax             ; stack starts at seg 0
   mov sp, 0x9c00         ; 2000h past code start, 
                          ; making the stack 7.5k in size
 
   cli                    ; no interrupts
   push ds                ; save real mode
 
   lgdt [gdtinfo]         ; load gdt register
 
   mov  eax, cr0          ; switch to pmode by
   or al,1                ; set pmode bit
   mov  cr0, eax
   jmp 0x8:pmode
 
pmode:
   mov  bx, 0x10          ; select descriptor 2
   mov  ds, bx            ; 10h = 10000b
 
   and al,0xFE            ; back to realmode
   mov  cr0, eax          ; by toggling bit again
   jmp 0x0:unreal
 
unreal:
   pop ds                 ; get back old segment
   sti
 
   mov bx, 0x0f01         ; attrib/char of smiley
   mov eax, 0x0b8000      ; note 32 bit offset
   mov word [ds:eax], bx
 
   jmp $                  ; loop forever
 
gdtinfo:
   dw gdt_end - gdt - 1   ;last byte in table
   dd gdt                 ;start of table
 
gdt:        dd 0,0        ; entry 0 is always unused
codedesc:   db 0xff, 0xff, 0, 0, 0, 10011010b, 00000000b, 0
flatdesc:   db 0xff, 0xff, 0, 0, 0, 10010010b, 11001111b, 0
gdt_end:
 
   times 510-($-$$) db 0  ; fill sector w/ 0's
   dw 0xAA55              ; Required by some BIOSes

I don't really understand the role of the CS register here.

We only care about increasing the limit of the data segment, not of the code segment, so we would only need to set the DS register.

Why does this code touch CS at all? Would a mov to DS not work to load the GDT entry, right after mov cr0, eax (or after a near jump just to flush the instruction prefetch queue in case there are any CPUs where that matters.)

This code does two far jumps. One far jump (in pmode) would leave CS = 8, not matching CS.base upon return to "real" (unreal) mode, so save/restore of CS:IP in interrupts would break things. But would zero far jumps also be an option?

In "real" mode, setting CS only sets the segment base address, not overriding the limit=unlimited IIRC, like for other segments. ([Segment size in x86 real mode](https://stackoverflow.com/q/17786357) / [How can a 32-bit x86 CPU start with reset vector 0xFFFFFFF0 even though it starts in 16-bit real mode?](https://retrocomputing.stackexchange.com/a/27038)). It seems unnecessary to do a far jump after returning to real mode, though. Assuming that segment descriptor had base = 0 and limit = -1. But I've just read about unreal mode, I haven't played around with code that uses it. — Peter Cordes, Aug 18 '23 at 09:57
BTW ISDM volume 3 mentions "Execute a far JMP instruction to jump to a real-address mode program. This operation flushes the instruction queue and loads the appropriate base-address value in the CS register" — harold, Aug 18 '23 at 09:59
Besides @harold being right about a far jmp flushing the instruction prefetch queue there are wider ramifications. While the base in CS will be 0 the actual value in the CS register will continue to be 8. If you enable interrupts and get a hardware interrupt or attempt to do a BIOS call (as an example) those will cause a CS of 0x08 to be put on the stack and when it comes time for the `iret` to return back from an interrupt or BIOS call it will restore CS to 0x08 with the right IP (instruction pointer). That will return to somewhere you don't expect(in this case 0x08:xxxx instead of 0x00:xxxx) — Michael Petch, Aug 18 '23 at 14:29
Here's [example code](https://pastebin.com/y8XFjNJ6) (based on the code in the question) that acts differently when jmp 0x0:unreal is called vs when it isn't. It uses a simple FAR CALL (that pushes the return address CS:IP) and then returns. Observe what happens when you use jmp 0x0:unreal and when you don't. This is a similar problem encountered when the processor will push CS:IP and then IRET from an interrupt handler or BIOS call in real mode when you don't set CS to a value you expect (like 0 in this case) — Michael Petch, Aug 18 '23 at 15:52
Thanks for all your answers! So if I understand correctly, I have to do the `jmp 0x8:pmode` like @harold said to flush the instruction prefetch queue, otherwise the CPU might translate the machine code incorrectly. Later I have to do the subsequent `jmp 0x0:unreal` because in real mode CS refers to the segments start address and not a descriptor, so it wouldn't be good to have 0x8 still in there while in real mode. Is that right? — moehr1z, Aug 18 '23 at 19:00
@MichaelPetch So wouldn't that also mean, that if I flush the queue another way, I could omit both far jumps? CS would hold 0x0 throughout the program. Wouldn't this only pose a problem if CS were to be read while we are in protected mode? As far as I understand this wouldn't occur, as interrupts are disabled until we are back in real mode again, where CS being 0x0 would make sense again. — moehr1z, Aug 18 '23 at 19:01
Yes, you'll note in my pastebin link (the code I gave) there is an example of using an unconditional jmp to the next instruction (rather than a far jump, see the code comment). That also has the side effect of flushing the instruction prefetch queue. The JMP 0x08:pmode is actually required to set a proper code segment with a 32-bit Code Selector AND it also flushes the instruction prefetch queue. The Code Selector's D-bit is used by the processor to determine the default operand size (32-bit, 16-bit) — Michael Petch, Aug 18 '23 at 19:44
@moehr1z: No, `jmp 0x8:pmode` is required to load a segment descriptor with limit=unlimited as part of the point of entering protected mode to set up unreal. Flushing the instruction prefetch queue is merely incidental, although of course that's also important on CPUs where that matters. The most important reason `jmp 0x0:unreal` is necessary is what Michael Petch pointed out: an interrupt handler will save/restore the current CS:IP, so CS needs to be `0` not `8` for that restore to put the internal CS.base back to the way it was (`0`) after leaving protected mode. — Peter Cordes, Aug 19 '23 at 04:41
@PeterCordes your answers confuses me a bit. Why should maximizing the limit of the code segment be the point of going into unreal mode? For my purposes it should suffice to increase the limit of the data segment only. Also the limit set in the descriptor of the code segment in the example is 0xffff which is still 64KiB. So the limit would be the same after `jmp 0x8:pmode` as it was before. — moehr1z, Aug 19 '23 at 08:34
Your question only talked about setting CS, not DS; I assumed you were aiming for [huge unreal mode](https://wiki.osdev.org/Unreal_Mode#Huge_Unreal_Mode) where you remove the limit for all segments, as a reason for messing with CS at all. (But I was forgetting that doesn't work easily since real-mode will only save/restore the low 16 (IP) of EIP, so it's not very useful.) If you don't want to set the CS limit, I'm not sure you do need to jump to a 32-bit code segment and back. I thought `mov ds, bx` would still work as expected as long as the pmode bit in CR0 was set, but I haven't tried. — Peter Cordes, Aug 19 '23 at 08:46
@MichaelPetch: Why do we need a 32-bit code segment at all here? What happens if we did `or` / `mov cr0, eax` / `mov ds, bx` (load DS from a segment descriptor without a limit) / `and` / `mov cr0, eax` (back to real mode). Do segment descriptors only get loaded properly if the current code segment is 32 or 64-bit, not if we haven't far-jumped since leaving real mode? — Peter Cordes, Aug 19 '23 at 08:50
@PeterCordes I'm sorry for being unclear with my question. But that is exactly the point that I didn't understand. Why does the example code see the need to mess with cs if we only care about ds? — moehr1z, Aug 19 '23 at 08:56
Yeah, now I wonder the same thing. According to https://wiki.osdev.org/GDT, there is a "DB" flag to indicate 16-bit or 32-bit segment. We'd have to check the Intel's full SDM, not the summary on osdev, to see if being in 16-bit mode stops you from using a 32-bit data segment, or if a 16-bit data segment can't use the full width of the "limit" field. OSdev says "*Hence, if you choose page granularity and set the Limit value to 0xFFFFF the segment will span the full 4 GiB address space in 32-bit mode.*" That phrasing might be a hint that it doesn't mean that or work in 16-bit protected mdoe. — Peter Cordes, Aug 19 '23 at 09:04
@PeterCordes : I would agree, If you are in 16-bit real mode and switch and enable protected mode you are in 16-bit protected mode (quasi). The default operand size is going to be 16-bits. There is nothing that says you can't load selectors in 16-bit protected mode that have 32-bit limits. In theory I'd say there is no need to actually have a GDT with a 16 or 32-bit Code Segment selector if you are just reloading the code segment registers. Some people may do it to guarantee the CS segment is set to some known setting if it was possible it was something nonstandard to begin with... — Michael Petch, Aug 19 '23 at 14:59
... @PeterCordes : My view is this. If you are in a situation that you don't know the previous state of CS and its hidden descriptor values (maybe someone set the limit of CS to 0xffffffff and you want to ensure it is 0xffff )and you want to ensure they are specific values - setting it is probably a good idea but in practical theory isn't necessary. I also believe that for old timers who have been around, we are dealing with something that was originally undocumented by Intel and the procedure used is more historical. — Michael Petch, Aug 19 '23 at 15:03
What I mean by historical was that originally unreal mode wa being used in the 80s without being documented by Intel. However companies like Microsoft were taking advantage of that mode - most notable 386 HIMEM.SYS (or HIMEM386.SYS). Intel over the years eventually documented behaviour about the hidden descriptor caches etc. In the dark age of using magical voodoo I think developers chose to use the methods that work in the event that Intel changed something under the hood that broke code that didn't adhere to a specific way of doing things. — Michael Petch, Aug 19 '23 at 15:06
If one is interested in the code for the on demand unreal mode HIMEMY 386 had, one may find this interesting: https://github.com/MikeyG/himem/blob/master/oemsrc/xm386.asm . From line 332 down, you'll see they use a far jump to set CS. A lot of code is based on the concepts Microsoft used historically and a piece of software that Intel didn't want to break with internal changes. Decades have past since then of course and the Intel processors are far better documented. — Michael Petch, Aug 19 '23 at 15:09
I got in a discussion about the Instruction Prefetch Queue as well in past SO comments. See this question: https://stackoverflow.com/questions/39964347/why-should-prefetch-queue-be-invalidated-after-entering-protected-mode . In some places it is suggested that you need to do a FAR JMP after enabling protected mode and the reason being Instruction Prefetch Queue. Of course there are other instructions (and forms of JMP) that flush the IPFQ without reloading CS. — Michael Petch, Aug 19 '23 at 15:15
The SO question in my last comment links to a piece of code in an old Intel manual that actually uses a near jump (without setting CS) to flush the IPFQ, and loads the DS and ES segment registers with 4GiB limits all without actually loading CS with a new descriptor. The code in question was meant to be used as a template for ROM code that executes after the machine is reset but suggests that doing this is acceptable if you know the state of CS to begin with and you have no reason to change it. — Michael Petch, Aug 19 '23 at 15:18
So if you ask me if you can do this without actually doing a FAR JMP to set CS with a 32-bit CS selector (or even a 16-bit one) - I'd say yes, but for historical reasons people may choose to use the more tried and true method that has been around for 30+ years. — Michael Petch, Aug 19 '23 at 15:19
Note: Intel suggests a method to enter Long Mode which requires going into protected mode first but OS developers have found that they can short cut around that and the method seems to work on a wide range of 64-bit Intel processors or compatible processors. That's not to say that it won'[t break in the future or that there is a chip manufacture who does something that breaks that method. The question is, do you want to walk the straight and narrow and do it a documented way or try something else. — Michael Petch, Aug 19 '23 at 15:23
@MichaelPetch: Thanks, "belt and suspenders" or copying boilerplate were my guesses why this osdev example code might be written that way if not actually necessary. The fact that it was previously undocumented definitely sheds more light on why defensive coding practices would be widely used. Your comments could make a good answer, at least if the question were adapted to ask what the OP said in comments they really wanted to know: why we're touching CS at all when we only care about setting the DS base/limit. With the edit at the bottom, it can be read as asking that. — Peter Cordes, Aug 19 '23 at 18:27
@MichaelPetch thanks for your detailed answer! That cleared it up. — moehr1z, Aug 20 '23 at 10:09
@PeterCordes I changed the question, so hopefully it is more clear now. — moehr1z, Aug 20 '23 at 10:09
Good edit. I refined it a bit more, now that it's clear what you're trying to ask, so hopefully @MichaelPetch's comments will drop nicely into an answer that doesn't have to go too far out of its way dealing with side issues (since the question phrasing already covers the problem with doing 1 far jump rather than 0 or 2, which also got discussed in comment after I forgot about that problem in my first comment, xD.) — Peter Cordes, Aug 20 '23 at 15:09

Does the CS register need to be set when setting up Unreal Mode?

0 Answers0