Indeed, using lastIndex is slower, but the order of magnitude is quite surprising .
So, let's deep dive into it :)
I slightly change your test, to move a "critical sections" to functions, and not use IO, variables reassignment, allocation, etc. while testing:
// actually, we should remove time-calculation from those methods
// as they are not important in the test we perform here
// 
// btw. kotlin supports time measurment in its standard library :)
// https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.system/measure-time-millis.html 
fun bySizeMinusOne(): Long {
    val tar = 2
    val nums = intArrayOf(0, 1, 2, 2, 3, 0, 4, 2)
    var count = 0
    var i = 0
    val begin = System.nanoTime()
    val n = nums.size - 1
    while (i <= n) {
        if (nums[i] != tar) {
            nums[count++] = nums[i]
        }
        i++
    }
    val end = System.nanoTime()
    return end - begin
}
fun byLastIndex(): Long {
    val tar = 2
    val nums = intArrayOf(0, 1, 2, 2, 3, 0, 4, 2)
    var count = 0
    var i = 0
    val begin = System.nanoTime()
    val n = nums.lastIndex
    // val n = nums.size - 1
    while (i <= n) {
        if (nums[i] != tar) {
            nums[count++] = nums[i]
        }
        i++
    }
    val end = System.nanoTime()
    return end - begin
}
So, let's run our tests:
fun main() {
    val bySize = bySizeMinusOne()
    val byLastIndex = byLastIndex()
    println(bySize)
    println(byLastIndex)
}
Is printing:
1090
22942410
So, a next step to find out what's going on is to decompile our code.
The only difference is
> size minus one
L5
 LINENUMBER 16 L5
 ALOAD 1
 ARRAYLENGTH
 ICONST_1
 ISUB
 ISTORE 6
> last index
L5  
 LINENUMBER 35 L5   
 ALOAD 1    
 INVOKESTATIC kotlin/collections/ArraysKt.getLastIndex ([I)I    
 ISTORE 6   
And here we are. It's not so simple to call another class, as JVM needs to load the class.
So, let's "improve" our testing by pre-loading methods/classes we need:
fun main() {
    // preload
    val array = intArrayOf() // load class
    array.size
    array.lastIndex
    val byLastIndex = byLastIndex()
    val bySize = bySizeMinusOne()
    println(bySize)
    println(byLastIndex)
}
Then we get something like:
420
1620
So the overhead is expected (we can see the difference in the bytecode).
There is another aspect - when compiling a production-ready jar, there can be a number of optimizations applied. Whenever testing such things, we need to remember to move the critical section to a function which is not including things we have mentioned here (IO, memory allocation etc.). And we should not rely on one result - we should run tests multiple time and check things like average, max/min, percentiles etc.