Questions about inline performance

Question

I had some questions about using inline on functions in C and C++. I have been told to use it on small functions that I use frequently but I want to understand exactly how it works. Here is just a snippet of an example.

static inline point3D createPoint3D(float x, float y, float z){
   point3D newPosition;
   newPosition.x = x;
   newPosition.y = y;
   newPosition.z = z;
   return newPosition;
}

What exactly does it do and why does it help the code run faster? Is this an outdated optimizations from the 90's?
Why am I only supposed to use it on small functions? Would it be bad if I did it for big functions?
Is it bad to use it on large numbers of functions?

Related notice (easy to forget if you never used inline): to make inline work across compilation-units boundaries, you have to put the *body* of the function to inline in a header, not just the prototype, otherwise the compiler won't have the code to inline when it'll be compiling the other compilation-units (see also http://www.parashift.com/c++-faq-lite/inline-functions.html#faq-9.6). — Matteo Italia, Sep 12 '10 at 22:10
You have some great answers about inline here as well: http://stackoverflow.com/questions/3647053/what-is-are-the-purposes-of-inline — default, Sep 12 '10 at 22:27
Note that the use of `inline` in C is quite different to `inline` in C++. The latter is more straightforward. [See here](http://stackoverflow.com/questions/6312597/is-inline-without-static-or-extern-ever-useful-in-c99) for C discussion. — M.M, Sep 15 '15 at 01:49
@M.M Thanks, this was a question I asked about five years ago, I now understand much more about the topic. — Justin Meiners, Sep 15 '15 at 03:57

score 5 · Accepted Answer · answered Sep 12 '10 at 22:00

It's more like an outdated optimization from the '70s or (at most) '80s. Nearly any competent compiler can select functions for inline expansion without any help from you beyond enabling the optimization to start with.
What it's supposed to do is eliminate the overhead of calling the function. This is mostly important for things like tiny functions to do next to nothing. As it happens, these are sufficiently common that achieving even halfway decent performance out of C++ nearly requires that the compiler expand functions inline more or less automatically.
It's generally pointless to use it at all.
Not usually -- as above, when there's a benefit to the function being inline, the compiler can usually do so automatically.

Two things to note: 1) most compilers can/will generate functions inline without the inline keyword, and 2) most compilers can/will ignore the inline keyword if they consider the function unsuitable for inline expansion (though, just FWIW, Microsoft has a __forceinline to overcome the latter if you're really sure you know better than the compiler).

In my case, the compiler does not inline a simple getter function if it does not see the definition of the function, even with O3 optimization level. In other words, the first point holds good only when the definition is in the header file itself. — talekeDskobeDa, Nov 11 '19 at 17:18

score 4 · Answer 2 · answered Sep 12 '10 at 21:56

Please see this detailed info in the C++ FAQ here. To quote about this inline functions..

When the compiler inline-expands a function call, the function's code gets inserted into the caller's code stream (conceptually similar to what happens with a #define macro). This can, depending on a zillion other things, improve performance, because the optimizer can procedurally integrate the called code — optimize the called code into the caller.

Section 9.3

inline functions might make it faster: As shown above, procedural integration might remove a bunch of unnecessary instructions, which might make things run faster.

inline functions might make it slower: Too much inlining might cause code bloat, which might cause "thrashing" on demand-paged virtual-memory systems. In other words, if the executable size is too big, the system might spend most of its time going out to disk to fetch the next chunk of code.

inline functions might make it larger: This is the notion of code bloat, as described above. For example, if a system has 100 inline functions each of which expands to 100 bytes of executable code and is called in 100 places, that's an increase of 1MB. Is that 1MB going to cause problems? Who knows, but it is possible that that last 1MB could cause the system to "thrash," and that could slow things down.

inline functions might make it smaller: The compiler often generates more code to push/pop registers/parameters than it would by inline-expanding the function's body. This happens with very small functions, and it also happens with large functions when the optimizer is able to remove a lot of redundant code through procedural integration — that is, when the optimizer is able to make the large function small.

inline functions might cause thrashing: Inlining might increase the size of the binary executable, and that might cause thrashing.

inline functions might prevent thrashing: The working set size (number of pages that need to be in memory at once) might go down even if the executable size goes up. When f() calls g(), the code is often on two distinct pages; when the compiler procedurally integrates the code of g() into f(), the code is often on the same page.

inline functions might increase the number of cache misses: Inlining might cause an inner loop to span across multiple lines of the memory cache, and that might cause thrashing of the memory-cache.

inline functions might decrease the number of cache misses: Inlining usually improves locality of reference within the binary code, which might decrease the number of cache lines needed to store the code of an inner loop. This ultimately could cause a CPU-bound application to run faster.

inline functions might be irrelevant to speed: Most systems are not CPU-bound. Most systems are I/O-bound, database-bound or network-bound, meaning the bottleneck in the system's overall performance is the file system, the database or the network. Unless your "CPU meter" is pegged at 100%, inline functions probably won't make your system faster. (Even in CPU-bound systems, inline will help only when used within the bottleneck itself, and the bottleneck is typically in only a small percentage of the code.)

There are no simple answers: You have to play with it to see what is best. Do not settle for simplistic answers like, "Never use inline functions" or "Always use inline functions" or "Use inline functions if and only if the function is less than N lines of code." These one-size-fits-all rules may be easy to write down, but they will produce sub-optimal results.

So I can use it as much as I want on anything if I don't care about file size? — Justin Meiners, Sep 12 '10 at 22:13

score 3 · Answer 3 · answered Sep 12 '10 at 22:02

Don't worry about it. It's all the same until you measure. And once you measure you will not notice a big difference between versions compiled with ot without inline.

1) inline is a suggestion to the compiler to "inline" the function directly in the flow of the code rather than "call" it. This bypasses the need to setup a stack, and do other chores needed to call a function

        NOT INLINE                    INLINE
        ...                           ...
        code                          code
        call fx    -\                 code from fx
        code        |                 code from fx
        call fx   --|                 code from fx
        ...         |                 code
                    |                 code from fx
        code <------/                 code from fx
        ...                           code from fx
        return                        ...

2) Use it wherever you want. The compiler will most likely ignore your suggestion

3) same as 2)

4) measure. experiment and compare

score 2 · Answer 4 · answered Sep 12 '10 at 21:59

The inline keyword indicates that you think this function is a good candidate for including in place of a call to the function. It is best used for functions that are small, because each use of it puts a fresh copy of the function body at the point of use. Over use could substantially increase the size of the calling code.

It is valuable because there are times that the optimizer could do a better job if it could see inside a small function. By putting the function body inline, the optimizer gets that chance. It also improves the locality of reference of the thread of execution, which can improve the performance of the instruction cache and pipeline.

In classic C, the only way to get this effect was with a macro, but macros have the significant disadvantage that they are a pure textual replacement, and hence they will cause each of their arguments to be evaluated every time they appear in the replacement text. It is also non-obvious how to safely allow a macro to have local variables.

In C++, there is often a huge advantage to allowing the small accessor functions that are a common idiom of the language to be inline, so much so that functions whose bodies are defined in the class definition are implicitly marked inline.

A good optimizer will decide for itself when to actually use the function inline and when to call it normally, so there isn't usually much adverse effect to liberally marking functions as inline.

Questions about inline performance

4 Answers4