13

A client contacted me, complaining of Excel spreadsheets that were taking far too long to open. They use Excel for creating invoices, so they have hundreds of Excel spreadsheets with light formatting and very simple calculations. When sorting the spreadsheets by size, I saw that while most of the spreadsheets ranged from 10-250k, there were a handful of spreadsheets with filesizes of 2-3 MB and more. Oddly, the filesizes weren't huge, they aren't holding much data, just a bit of formatting, maybe two or three pages of printed invoices, but they amount of data was almost identical to the smaller sized (and normal opening) spreadsheets.

When opening the file, the amount of RAM needed would spike from 3 MB to 400 MB, and it would completely occupy a single core (tested on a dual-core in the office, and my quad-core laptop), while starting. I thought that they had somehow caught some VBA code, but there are no macros, no VBA code. Ctrl+End shows 39 rows and about 12 columns (ends at M). I've even deleted the data, row by row, or column by column, until there is no data left, and it still gives me the same problem.

I've reviewed plenty of Google searches, but I've gotten nowhere. Can anyone offer any help?

6 Answers6

7

I've found the answer to my problem!

Using the clues given to me by allquixotic, I opened the xlsx files in 7-Zip, and compared the file sizes. There was one file that was much larger than the others. The file xl\drawings\drawing1.xml was a file that contained multiple references to formats, over and over again.

I spent a few hours trying to figure out a pattern, but wasn't able to. Nothing I was able to do could get it working! Then, after a fit of aggravation, I just deleted the damn file, and tried to reopen in Excel (2010- didn't test it in Excel 2007).

It complained that the file was damaged, and asked if I'd like to attempt a repair. Repairing the file simply erased the shape, but didn't change the formatting of the file at all. I had to resave the file as the same file, which was a bit odd, but it worked!

As I'd mentioned before, I hadn't tested this solution for any products other than Excel 2010, so I don't know if the shape file was critical to Excel 2k7 or any OpenOffice products. But, if you have a similar problem, hopefully this could be helpful.

6

Are you using the binary .xls format or the new XML-based .xlsx format? In general the .xlsx format results in a dramatic reduction in filesize.

Check for things like an excessive number of styles stored in the document.

Try to "remove personal information" (a feature of Excel/Word/etc.) to clean up certain kinds of cruft that might be sitting in the file.

If the spreadsheet is, or has previously been shared, it may have old share data stored in it.

One simple fix is to copy and paste just the relevant data from that spreadsheet to a new one, then save it in .xlsx format, and see how small it is. If it's very small then you have your answer -- Excel is doing poor accounting of its internal file format's data structures.

Also check for complicated or circular formulas and references to external sheets (especially those on network drives). If you think a formula might be slow, you can step through it using the formula auditing tool.

Last thing: if you save it as an .xlsx and it's still big, try downloading the OpenXML SDK Productivity tool: http://www.microsoft.com/en-us/download/details.aspx?id=5124

Open up the .xlsx in that and just take a look at all the elements within the file and see if anything is obviously extraneous. This may require knowledge of XML and of the specific OpenXML schemas, but it's a surefire way to find out what is causing the bloat.

P.S. -- if this kind of thing annoys you, stop using Microsoft formats / programs or suggest to your customer to do so. If you google around for "Microsoft Access database bloat" you'll see that Microsoft has a long history of letting their proprietary formats leak tons of useless data to disk that never gets cleaned up. It's like a really nasty memory leak that eats your disk instead of your RAM.

allquixotic
  • 34,882
2

Another possible solution would be:

  1. Create a copy of the problematic Excel file
  2. Open that copy
  3. Press CTRL+A and then click on "Clear" -> "Clear Formats"
  4. Repeat step 3 on every worksheet
  5. Save the file and try to reopen it

When there is too much formatting, especially if there are many different styles applied to many individual cells, Excel really struggles trying to apply formatting when opening files.

Naturally, when you remove all formatting you'll end up with just text with default style applied to it in all of the cells with borders, shading, etc gone. But this way you can pinpoint the cause of the problem.

ruslaniv
  • 932
0

In my case the problem was related to "Format as table". The table had 16000 empty columns. I guess this is one of the most frequent reasons for this kind of problem.

Revious
  • 365
0

I had a look at all the conditional formatting rules, in the whole worksheet, and there were loads. I went through and cleared them all, and then created the ones I actually needed, and now it is really fast. Also got rid of all the 'Connections' I didn't need, and save as .xlsb after it all worked well, and it is even faster now :o)

Paul
  • 1
0

For a dramatic reduction in size you might try the format .xlsb. Should reduce more than any other XML-based format that excel provides.

Open one of those spread sheets and take a look at the format. Look for anything usual. Try to copy the entire column to another excel spreadsheet and paste values only (with no formatting whatsoever). See if it helps.

TwirlMandarin
  • 1,925
  • 2
  • 25
  • 29