how to avoid caching when writing to mmio registers?

Question

I'm writing a custom os in virtualbox and having trouble writing and reading successfully from the IOAPIC mmio registers. i.e. It seems to ignore the index register write. After loading R8 with the IOAPIC base address (determined from ACPI enumeration to be 0xFEC00000), I use the following routines to read/write:

; -----------------------------------------------------------------------------
; IN :  RAX = ioapic address, EBX = index register
; OUT:  ECX = return value
ioapic_read:
    mov [r8], ebx
    mov ecx, [r8 + 0x10]
    ret
; -----------------------------------------------------------------------------
; IN :  RAX = ioapic address, EBX = index register, ECX = value
; OUT:  -
ioapic_write:
    mov [r8], ebx
    mov [r8 + 0x10], ecx
    ret

But an ioapic_read will always return the last value written (by ioapic_write) irrespective of the index used. I have identity paging setup to use 0x9B which I think should disable caching.

I have tried using pause after each of the movs. Didn't help. Tried mfences between the movs. Didn't help.

I have confirmed the 0xFEC00000 address is successfully identity mapped.

It looks like there's still some caching going on. What am I missing?

EDIT

I have discovered it's not a caching issue but something a lot stranger - at least to my ignorant brain. My identity paging, works on demand such that a page fault will generate the correct physical page in the tables.

This seems to be working but in the case of the IOAPIC mmio registers, I need to cause a page fault by doing a dummy read or write to the 0xFEC00000 address prior to attempting to use it. The even odder thing is that I need to do this dummy read enough instructions prior or it doesn't work. e.g.

This WORKS!

 mov eax, [os_IOAPICAddress]
 mov dword[rax], 0
 mov r8, rax
 .
 .
 .
 call ioapic_read

... this DOESN'T!

 mov eax, [os_IOAPICAddress]
 mov r8, rax
 mov dword[rax], 0
 .
 .
 .
 call ioapic_read

I suspect a pipelining/serializing issue but I would really love to learn both why I need to page fault the address into the tables before using it in an MMIO register, and why I need to do it far enough in advance. In the latter case, how to fix it so it is serialized such so I don't need to worry about it.

My identity paging routine:

pageFault_identity_0x0E:
    pop r8
    push rsi rdi rax rcx rdx r9

    test r8, 1
    jnz exception_gate_14
    mov rdx, cr2                                   ; faulting address
    shr rdx, 39
    and rdx, 0x1FF                                 ; get 9 bit index      

    mov rdi, cr3
    lea rsi, [rdi + rdx*8]
    mov rdi, [rsi]
    test rdi, 1
    jnz @f
    call set_new_page_table                                               
@@:
    shr rdi, 12                                     ; get rid of flags
    shl rdi, 12

    mov rdx, cr2
    shr rdx, 30                                     ; get 9 bit index    
    and rdx, 0x1FF

    lea rsi, [rdi + rdx*8]
    mov rdi, [rsi]
    test rdi, 1
    jnz @f
    call set_new_page_table                                               
@@:
    shr rdi, 12                                     ; get rid of flags
    shl rdi, 12

    mov rdx, cr2
    shr rdx, 21
    mov rax, rdx
    and rdx, 0x1FF                                  ; get 9 bit index    
    lea rsi, [rdi + rdx*8]

    shl rax, 21
    or rax, 0x83
    mov [rsi], rax

    shr rax, 21
    shl rax, 21

    pop r9 rdx rcx rax rdi rsi
    iretq
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;;;;;;
; IN:   rsi = address of blank entry
; OUT:  rdi = base address of new table, changes rax & rcx
;
set_new_page_table:                                ; make table, get it, zero it, insert base into previous table
    movzx rdi, [page_table_count]
    shl rdi, 12
    add rdi, NEW_PAGE_TABLES

    CLEAR_BLOCK rdi, 0x200                     ; clears 4096 bytes in rdi, returns rdi + 4096

    sub rdi, 0x1000
    lea rax, [rdi + 0x3]                              ; table base address
    mov [rsi], rax
    inc [page_table_count]
    ret

Are these functions `ioapic_read` and `ioapic_write` called from _C_ ? — Michael Petch, Oct 02 '16 at 23:47
Is the memory region marked non-cacheable? http://stackoverflow.com/questions/90204/why-would-a-region-of-memory-be-marked-non-cached — stark, Oct 03 '16 at 00:27
It is in the page table. But is there some other place I need to set the region as non-cacheable? — poby, Oct 03 '16 at 00:29
@Poby, You shouldn't. The CPU use the most restrictive caching set for the memory region accessed (And it seems you used UC- in the PTE). You should rule out any cache issue though, try disabling caching entirely (setting CD and NW of CR0 and flushing the cache) or try using the variable MTRRs to map the IOAPIC MMIO as UC-. — Margaret Bloom, Oct 03 '16 at 07:36
Did you actually verify that `r8` has the value you think it does? Did you happen to step through with a debugger? I noticed in code you didn't post here that you have an instruction `mov r8, [IOAPICAddress]` . How did you define `IOAPICAddress`? — Michael Petch, Oct 03 '16 at 08:03
@Margaret: Tried setting CD and NW, didn't help. So if it's not a caching issue, what else could it be? — poby, Oct 03 '16 at 08:42
@MichaelPetch: I have absolutely verified that R8 contains what it's supposed to. IOAPICADDRESS contains 0xFEC00000 which I extracted from the ACPI tables — poby, Oct 03 '16 at 08:43
If you read the registers before writing them do you get reasonable values? Most IOAPIC registers are fully writable, have you tried writing RO registers like IOAPICVER or IOAPICARB? — Margaret Bloom, Oct 03 '16 at 09:09
It doesn't seem to matter what value I select for the index. If I don't write first, I get back 0x8E000008. After I write, I get back whatever I wrote irrespective of index register used. — poby, Oct 03 '16 at 10:18
Ok, that 0x8E000008 looks suspiciously familiar. I'm pretty sure now my identity paging is cactus (faulty). — poby, Oct 03 '16 at 10:35
Yes it is! The page tables are working however... see the edit to the question. — poby, Oct 03 '16 at 16:34
Do you invalidate the page (Flushing TLB entries for it) when changing the page table bits in your page fault handler? — Michael Petch, Oct 03 '16 at 17:13
I tried that, but it made no difference. Is it needed? I thought flushing the TLB entries wouldn't be necessary if the page isn't present in the table anyway. — poby, Oct 03 '16 at 17:15
I don't know how your page fault handler and your memory mapping work so I was simply asking. — Michael Petch, Oct 03 '16 at 17:16
As I recall you can pretty much change all the bits in a page table entry as long as the present flag remains clear (0). But If you are changing the present bit (1 to 0) then I believe you need to flush the TLB entries associate with that page. If the present bit is already set on in an entry you can change the available bit without worrying about a TLB entry flush. But if if the present bit is already on and you change any of the bits other than the available bit you need to flush the TLB entries. — Michael Petch, Oct 03 '16 at 19:42
@RossRidge I came back to revisit my comment here and I see you made a comment regarding the same error. You are quite right. — Michael Petch, Oct 03 '16 at 19:42
I don't know seems there might be caching going on. As an experiment in just ioapic_read between the two move instructions place an `sfence` instruction. — Michael Petch, Oct 03 '16 at 23:54
You said in your question _0x9B which I think should disable caching_ . I assume by that you are actually using 4mb paging? If that 0x9B is used as flags for a page table entry then I think "Houston we have a problem.". If it was for a page directory entry it would make some sense but that would imply your are using paging > 4kb. — Michael Petch, Oct 04 '16 at 00:09
@MichaelPetch: I'm using 2MB pages in long mode. The 0x9B is for a page directory entry. I tried using sfence as you suggested, early on but it made no difference — poby, Oct 04 '16 at 00:59
I knew you either had a bug or you were using something greater than 4kb. Your question didn't say it just said 0x9b with no real context. — Michael Petch, Oct 04 '16 at 01:02
I'm not sure that you'll get an answer besides people suggesting ways to track down the problem. This isn't a minimal complete verifiable example. What happens if you do the same code with caching turned off as Margaret suggested much earlier? If you made available your complete code I am sure someone could figure it out. There could be problems with your page tables, your fault handler, the caching in general, or something else. — Michael Petch, Oct 05 '16 at 01:22
Looking at your page fault handler. Am I mistaken but are you destroying the contents of _R8_ without restoring it? You pop the error code into _R8_ but that means the previous value of _R8_ has been trashed. When your fault handler terminated _R8_ most likely no longer has the original value in it. — Michael Petch, Oct 05 '16 at 02:39
THANK YOU! That was it! I don't know how I missed it, looked at that code 100s of times. Such a stupid little error causing me so much grief! Please write it up as an answer so I can happily reward you with the points — poby, Oct 05 '16 at 02:54

Michael Petch · Accepted Answer · 2016-10-05T04:01:55.523

Given the original code it looked as if you were setting the page directory entry bits properly to mark the MMIO region uncachable. I was convinced there was some other issue. With your subsequent edit you showed us your page fault handler pageFault_identity_0x0:

pageFault_identity_0x0E:
    pop r8
    push rsi rdi rax rcx rdx r9

When the processor transfers control to to this page fault exception handler it will pass an error code on the top of the stack as a parameter. The problem is that you replace the contents of R8 with the error number without saving and then restoring the register.

You'll have to modify your exception handler to preserve R8, move the contents from the proper stack offset where the error number is into R8. Just remember to ensure that the error number is no longer on the top of the stack prior to the IRETQ.

Likely the odd behaviour you have been getting is directly related to R8 not being properly restored upon return from a page fault.

A solution that may work is:

pageFault_identity_0x0E:
    push rsi
    push rdi
    push rax
    push rcx
    push rdx
    push r9
    push r8

    mov r8, [rsp+7*8]    ; Error Code is at offset RSP+7*8 after all the pushes
    ; Do exception handling work here

    pop r8
    pop r9
    pop rdx
    pop rcx
    pop rax
    pop rdi
    pop rsi

    add rsp, 8           ; Remove the error code
    iretq

I wasted 2 days of my life trying to find this. I debugged that paging routine so thoroughly I was sure it wasn't the culprit. But somehow I missed that pop R8. It's a load off my mind :) — poby, Oct 05 '16 at 03:25
Not a problem, I am really glad you added the page fault handler code. I had a suspicion it was the issue. Glad it works for you now. Kernel development and debugging can be tricky business. You probably just needed a fresh set of eyes to look at it. — Michael Petch, Oct 05 '16 at 03:27

poby · Answer 2 · 2016-10-12T08:04:30.400

2

Michael solved it but in the interests of completeness, I will post my final implementation.

pageFault_identity_0x0E:
    test qword[rsp], 1
    jnz exception_gate_14
    add rsp, 8

    push rsi rdi rax rcx rdx
    mov rdx, cr2                                   ; faulting address
    .
    .
    .
    pop rdx rcx rax rdi rsi
    iretq

EDIT: Have edited to remove the xchg.

edited Oct 12 '16 at 08:04

answered Oct 11 '16 at 18:59

poby

1,572
15
39

3

I avoided using XCHG with a memory operand mainly because it will result in a implicit lock that results in exclusive ownership of the corresponding cache line and a performance penalty. XCHG with two registers though doesn't have that problem. Intel optimizations guide have this suggestion "minimize the use of xchg instructions on memory locations" – Michael Petch Oct 11 '16 at 19:40
Cheers. I didn't know that. Have fixed it. – poby Oct 12 '16 at 08:09

how to avoid caching when writing to mmio registers?

2 Answers2