Operand type mismatch in x87 inline assembly in a Linux kernel module

Question

I really want to use floating point arithmetic in a Linux kernel module, just for the heck of it. I don't want to do anything fancy, just use the x87 trig instructions and/or the sqrt instruction, then assign the result to a variable. That's about it. So far, I've tried:

float sqroot(float arg){
    float returnValue;
    asm(
     "fld %1\n"
     "fsqrt\n"
     "fst %0"
     :"=r"(returnValue) 
     : "r"(arg)
    );
    return returnValue;
}

This fails miserably and yields the following error:

Error: operand type mismatch for `fld'
Error: operand type mismatch for `fst'

Any and all help will be appreciated.

Whaaaaaat? This isn't a recommendation question, it just has a bad title. Voting to reopen. — user253751, Feb 08 '16 at 22:26
Title has been edited to more closely match the content of the question, and not appear to be a simple recommendation for a tutorial. — Brian Campbell, Feb 08 '16 at 22:29
Your output `=r` and input `r` constraints currently indicate using a general-purpose integer register as argument to the `fld` and `fst` . I think you may need to change that to two input `m` (memory) constraints. — Iwillnotexist Idonotexist, Feb 08 '16 at 22:38
Side note: When you do floating point inside the kernel you have to take special care to save/restore the FP environment to prevent a context switch from messing up a userspace app. There are special kernel calls to do this [or you disable interrupts, etc. and save/restore manually] — Craig Estey, Feb 08 '16 at 23:00
@Iwillnotexist Idonotexist Changing the constraints to `=m` and `m` worked for me. — SUBmarinoff, Feb 08 '16 at 23:25

score 1 · Answer 1 · edited Feb 18 '19 at 19:31

Using x87 from a kernel module will "work", but silently corrupts user-space x87 / MMX state. Why am I able to perform floating point operations inside a Linux kernel module?

You need kernel_fpu_begin() / kernel_fpu_end() to make this safe.

Instead of loading/storing from inline asm, ask for input and produce output on the top of the x87 register stack and let the compiler emit load/store instructions if needed. The compiler already knows how to do that, you only need to use inline asm for the sqrt instruction itself, which you can describe to the compiler this way:

static inline
float sqroot(float arg) {
    asm("fsqrt"  : "+t"(arg) );
    return arg;
}

(See the compiler-generated asm for this on the Godbolt compiler explorer)

The register constraints have to tell the block to use the floating point registers.

Or avoid inline asm entirely, using a GNU C builtin that can inline

You need to use -fno-math-errno for the builtin to actually inline as fsqrt or sqrtss, without a fallback to call sqrtf for inputs that will result in NaN.

static inline
float sqroot_builtin(float arg) {
    return __builtin_sqrtf(arg);
}

For x86-64, we get sqrtss %xmm0, %xmm0 / ret while for i386 we get fld / fsqrt / ret. (See the Godbolt link above). And constant-propagation works through __builtin_sqrt, and other optimizations.

EDIT: Incorporating @iwillnotexist-idontexist's point (re double loading).

Also, if it were me, I'd add static inline to the declaration and put it in a header file. This will allow the compiler to more intelligently manage registers and avoid stack frame overheads.

(I'd also be tempted to change float to double throughout. Otherwise, you're discarding the additional precision that is used in the actual floating point instructions. Although if you will end up frequently storing the values as float, there will be an additional cvtpd2ps instruction. OTOH, if you're passing arguments to printf, for example, this actually avoids a cvtps2pd.)

But Linux kernel kprintf doesn't have conversions for double anyway.

If compiled with -mfpmath=387 (the default for 32-bit code), values will stay in 80-bit x87 registers after inlining. But yes, with 64-bit code using the 64-bit default of -mfpmath=sse this would result in rounding off to float when loading back into XMM registers.

kernel_fpu_begin() saves the full FPU state, and avoiding SSE registers and only using x87 won't make it or the eventual FPU restore when returning to user-space any cheaper.

This gets flagged: `error: output operand 0 must use ‘&’ constraint`. Do you need `=&t`? gcc version 5.3.1 — Craig Estey, Feb 08 '16 at 22:46
I'm not seeing that on my box (gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)). It even produces correct results. However, it also works with the `'&'` so will add that to my answer. — Gil Hamilton, Feb 08 '16 at 22:49
I've disassembled the "m" version [suggested by Iwillnotexist] and yours. Yours has more wrapper code. I didn't analyze which is correct/incorrect [if either] — Craig Estey, Feb 08 '16 at 22:56
@CraigEstey That's because the `t` and `f` constraints impose an (artificial) constraint that the input must be in x87 stack registers already, whereas mine does not. To make it as good as mine but with `t` and `f` I'd delete the explicit `fld` and `fst` instructions. — Iwillnotexist Idonotexist, Feb 08 '16 at 23:01
This gets flagged: `error: impossible constraint in asm` with or without the `&` symbol. GCC is `gcc (Debian 4.7.2-5) 4.7.2` — SUBmarinoff, Feb 08 '16 at 23:18
@GilHamilton You need to do _one_ more thing for this to be correct. Change the `f` input constraint to `t`, else the compiler may feel entitled to place the argument elsewhere than `st(0)`. — Iwillnotexist Idonotexist, Feb 10 '16 at 00:24

Operand type mismatch in x87 inline assembly in a Linux kernel module

1 Answers1

Or avoid inline asm entirely, using a GNU C builtin that can inline

Linked