in reply to Re: Spreadsheet::WriteExcel large files (text versus binary format)
in thread Spreadsheet::WriteExcel large files

What is confusing for me is that both the 6 Mega file and the 100 Mega file are Microsoft office documents that are not ASCAII encoded.
guyn@il-kblwe02>ls -1sh total 153M 6.1M linking_optimization_results_small.xls 47M linking_optimization.txt 100M linking_optimization.xls guyn@il-kblwe02>file linking_optimization_results_small.xls linking_optimization_results_small.xls: Microsoft Office Document guyn@il-kblwe02>file linking_optimization.xls linking_optimization.xls: Microsoft Office Document
Also, they have the same amount of lines
guyn@il-kblwe02>wc -l linking_optimization.xls + + 304562 linking_optimization.xls guyn@il-kblwe02>wc -l linking_optimization_results_small.xls 299554 linking_optimization_results_small.xls
So why is one so much larger than the other?

Replies are listed 'Best First'.
Re^3: Spreadsheet::WriteExcel large files (text versus binary format)
by BrowserUk (Patriarch) on Jan 02, 2012 at 12:47 UTC
    What is confusing for me is that both the 6 Mega file and the 100 Mega file are Microsoft office documents that are not ASCAII encoded.... So why is one so much larger than the other?

    My guess is that the smaller contains just the results, whereas the the larger contains the formulae used to derive those results. But that is only a guess.

    Also, they have the same amount of lines

    Using wc -l on binary files is not useful. It only tells you how many bytes with the value 13 decimal it contains. But those bytes are probably not newlines but rather just bytes within packed binary values that happen to look like newlines.

    I would have thought your simplest option would be to open each of the files using Excel (or other program that can read .xls files) and inspect what they each contain.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      They both contain the same data. The smaller file was created by opening the large file and saving it again (hence my confusion)
      It makes sense that the formulas might have been lost in the process, but it is still surprising that the size difference is so huge.

        .xls files can contain all sorts of stuff. In addition to the formulae and values, they can also contain whole libraries of macrocode; lookup tables; formatting instructions etc. I think they can also contain embedded images and graphs though I'm not sure about that. They are also known to contain all sorts of other crap, some of which can have security implications.

        As you are creating the smaller file by only copying over the values of a range of cells, all that other stuff will not exist in the file created.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?