How does Excel's INDEX know to use REFERENCE form?

Question

I was trying to build a formula that can SPILL to return all unique text rows in a column apart from the first (header).

I found a solution at https://superuser.com/a/1259946/1137272 which works and gives me what I need using the INDEX function - which is great and I've successfully translated it from the SUM() application described into my application with UNIQUE(), but I'm having trouble understanding how it achieves the result.

The solution as provided is

=SUM(A2:INDEX(A:A, ROWS(A:A)))

in my case this translates to the functioning but more complex:

=UNIQUE(FILTER('SheetName'!D2:INDEX('SheetName'!D:D,ROWS('SheetName'!D:D)),ISTEXT('SheetName'!D2:INDEX('SheetName'!D:D,ROWS('SheetName'!D:D)))))

I follow that ROWS(A:A) gives the last row number (1048576 in my case). INDEXing by N into an array returns the Nth value in the array, so I would assume INDEX would return the value of A1048576 - in my case an empty cell, and the SUM formula would expand to SUM(A2:[blank]) or SUM(A2:0) and thus give an error.

However, this isn't what happens - apparently there's also a 'reference' form of the INDEX function and somehow INDEX knows to return the cell reference rather than the value of the cell, but I don't understand how this is triggered. It looks to me like the syntax for both forms is exactly the same: INDEX(array/reference, row_num, [optional_arguments])

In cases like this where an array and reference look identical (A:A), how does it know when to return the value and when to return the cell reference?

Reddy Lutonadio · Answer 1 · 2022-12-07T19:28:07.227

In cases like this where an array and reference look identical (A:A), how does it know when to return the value and when to return the cell reference?

As stated here:

It's important to note that the INDEX function returns a reference as a result.

The formula mentioned by the OP is =SUM(A2:INDEX(A:A, ROWS(A:A))). Let start only with the INDEX part (INDEX(A:A, ROWS(A:A))). If this part was the only part of the formula, INDEX would reference A1048576 and would return its value. You may consider this part of the formula to be analogous to =A1048576.

In the case of the whole formula (SUM(A2:INDEX(A:A, ROWS(A:A))), INDEX is combined with SUM function. Thus, as already mentioned the INDEX function returns a reference, when INDEX(A:A, ROWS(A:A)) is processed by Excel, the reference is A1048576. Excel use this reference in the SUM function. Thus, you will have SUM(A2:A1048576).

Good rule of thumb to differentiate between array and reference forms, if INDEX function is used with:

another function as the example above, then it's the reference form.
multiple ranges as its first argument, then it's the reference form.
only one range as its first argument, then it's the array form.

How does Excel's INDEX know to use REFERENCE form?

1 Answers1