3

I build an Excel 2007 spreadsheet which contains a larger table with source data (about 500,000 rows and 10 columns). I need to extract data from this large table for my analysis. To extract and aggregate data I usually use sumif, vlookup/hlookup and index+match functions.

I recently learned about the existence of the function getpivotdata, which makes it possible to extract data from a pivot table. To be able to use it, I first need to convert my large source table to a pivot table and after that I can extract data using the function getpivotdata.

Would you expect a performance improvement if I would use getpivotdata to extract and aggregate data instead? I would expect that within the underlying Pivot object aggregated values are pre-calculated and therefore performance would be better.

If performance would be better, are there any reasons not to follow this approach? To be clear, there is no need to refresh the pivot table because it contains source data (which is located in the beginning of the calculation chain).

Ruut
  • 427

4 Answers4

3

I did a few performance tests on a dual core 2.33 GHz 2 GB RAM desktop PC with Excel 2007.

The lookup was done on a table with 241,000 records. The results are (the fastest first and the slowest last):

  1. With the index-match function on an sorted list the number of lookups per seconds was:180,000!! (based on 1,440,000 lookups in 8 seconds). More info on how to implement sorted lookups in Excel can be found here and scroll down to section INDEX-MATCH in One Formula, Sorted Data

  2. With the getpivotdata function the number of lookups per second was:6,000 (based on 250,000 lookups in 40 seconds)

  3. With the getpivotdata function using very flexible single argument string syntax (see here) the number of lookups per second was: 2,000 (based on 250,000 lookups in 145 seconds)

  4. With the index-match function on an unsorted list the number of lookups per seconds was:500 (based on 20,000 lookups in 35 seconds)

The results do not change when the lookup function refers to a Data Table instead of a named range.

So to answer the question. Lookups by getpivotdata are about 10 times as fast as regular index-match lookup, but best performance improvement is achieved by sorting your source data. Sorting your source data could make your lookup 400 times as fast.

Ruut
  • 427
1

Doing the lookups with VBA (using a dictionary) is by far the fastest way. See this: https://stackoverflow.com/questions/18656808/how-to-optimize-vlookup-for-high-search-count-alternatives-to-vlookup

jj2j
  • 11
0

Using GetPivotData only gives you access to whatever is visible in the PivotTable report. If you are the sole user of this spreadsheet then this may be a viable approach for you.

If you can design the Pivot to do most of your aggregations for you then using GetPivotData will be faster.

I have not tested GetPivotData performance, but I would expect it to be slower than a Binary Search Lookup/Match on sorted data.

0

I have the same issue on a daily basis. Large number of rows in multiple data tables in Excel.

Currently the only solution that makes extremely large tables usable is to export them to a database server and do/write SQL queries to do the Sumif's,Vlookups and aggregation

You can use excel to create the SQL queries

Over the years I have exported sheets/tables to "MySQL" and "MS SQL Server express" and then connect to them with excel and write SQL queries

The server does the Processing faster than excel and if the database is on a different server the performance increases since its not using your PC's resources to do the calculations.

There are other benefits to this solution as well.

Like ETL automation and the Sharing of a connection string rather than a "BIG" spreadsheet.

BOB
  • 226
  • 1
  • 5