September 28, 2007

Bugged by Excel

A colleague calls my attention to the fact that Excel 2007 is prone to a curious arithmetic miscalculation.

From AppScout:

We all learned how to multiply with pencil and paper, even great big numbers and decimals. But when it comes to something important like a blueprint or a scientific formula we reach for a calculator-- or a spreadsheet. That's much more reliable, right? Well, not if the spreadsheet is Excel 2007. Over the weekend a member of the microsoft.public.excel newsgroup revealed that Excel 2007 thinks that 850*77.1 is 100,000. What's the correct answer? Anybody? Anybody? Bueller? Anybody? Right, it should be 65,535. Other members verified that the error carries over into some (but not all) calculations based on the incorrect result. Microsoft has been informed of the bug, but hasn't yet formulated a response.

UPDATE: Microsoft recognizes the problem and assures us that Excel Will Learn to Multiply.

GOOD NEWS: The Excel team has dissected the problem in detail and is working feverishly to swat this Excel bug.

If it were just 850*77.1 that gave a wrong answer, we could probably work around that. But there are tons of other problem numbers, as I discovered for myself. I set up a spreadsheet to divide 65,535 by every number from 1 to 65,535 itself, then multiply the number by that result. So, for example, it divided 65,535 by 26 to get 2,520.577. Then it multiplied 26 by 2,520.577 to get... 100,000?! Over ten thousand of these simple calculations gave the wrong answer.

We won't know just why the problem comes up until Microsoft speaks out, but there is one thing about 65535-- it's the very largest 16-bit number. In binary it's a string of 16 ones. In hexadecimal (the programmer's friend) it's FFFF. But converting the "problem" results to hexadecimal in Excel yields FFFE. That's surely a clue. Meanwhile, if you have any spreadsheets where some results hit the range around 65535, it might be a good idea to double-check with your trusty calculator... or a pencil.

The problem appears just to be with Excel 2007. So those of us who do math the old-fashioned way-- with Excel 2003-- are probably still ok.

Technorati Tags:

Posted by James Hamilton at September 28, 2007 06:57 AM

digg this | reddit

The problem most likely is a boundary condition on a 16-bit representation. 65535 is actually {2^16 - 1}, and is always ordinal zero (i.e. we start counting at at 0, not 1).

Posted by: caveat bettor at September 28, 2007 08:19 AM

Was it really such a big deal?

Explaining the Excel Bug
This item ran on the Joel on Software homepage on Wednesday, September 26, 2007

By now you've probably seen a lot of the brouhaha over a bug in the newest version of Excel, 2007. Basically, multiplying 77.1*850, which should give you 65,535, was actually displaying 100,000.

Before I try to explain this, I should disclose that I did work on the Excel team, but that was thirteen years ago. I haven't been there for a long time. I don't even think I know anyone on that team any more. I'm just trying to explain the bug a little bit as a public service.

The first thing you have to understand is that Excel keeps numbers, internally, in a binary format, but displays them as strings. For example, when you type 77.1, Excel stores this internally using 64 bits:

0100 0000 0101 0011 0100 0110 0110 0110
0110 0110 0110 0110 0110 0110 0110 0110

The display is showing you four characters: "7", "7", ".", and "1".

Somewhere inside Excel is a function that converts binary numbers to strings for displaying. This is the code that has the bug that causes a few numbers which are extremely close to 65,535 to be formatted incorrectly as 100,000.

If you use the number further along in calculations, for example, if you add 2 to the results, you'll get the right thing.

=77.1*850 -> displays 100000

=77.1*850+2 -> displays 65537, correctly.

Just to throw people off, this bug also exists for a few numbers which are extremely close to 65,536. They display incorrectly as 100,001.

=77.1*850+1 -> displays 100,001, incorrectly.

This is still only a bug in the number formatting code; if you try to make a chart with that number in it, you'll get a correct chart.

Now... you may have noticed that I said that this bug exists for numbers which are extremely close to 65,535, but not for 65,535 itself. Indeed if you enter 65,535 you see 65,535. But, you notice, 77.1 * 850 should be exactly 65,535, not extremely close to 65,535!

Look closely at the binary representation for 77.1:

0100 0000 0101 0011 0100 0110 0110 0110
0110 0110 0110 0110 0110 0110 0110 0110

See how there's a lot of 0110 0110 0110 there at the end? That's because 0.1 has no exact representation in binary... it's a repeating binary number. It's sort of like how 1/3 has no representation in decimal. 1/3 is 0.33333333 and you have to keep writing 3's forever. If you lose patience, you get something inexact.

So you can imagine how, in decimal, if you tried to do 3*1/3, and you didn't have time to write 3's forever, the result you would get would be 0.99999999, not 1, and people would get angry with you for being wrong.

The same thing happens in binary with numbers ending in 0.1: they are repeating decimals, so when you do mathematical operations on them, very small insignificant errors creep in somewhere way to the right of the decimal point. (PS: same for .2, .3, .4, .6, .7, .8, and .9, but not .5).

The IEEE has a standard, IEEE 754, for how to represent floating point numbers in binary, and this is what almost everybody uses, including Excel, and they have for a really long time, and it means sometimes you get imprecise results when you add a lot of 0.1's together, but if you're rounding the numbers to a reasonable number of decimal points, you won't really care.

Back to the Excel bug, which is a genuine bug, not just an artifact of this IEEE 754 stuff. Since 77.1 has no exact representation, Excel stores it as

0100 0000 0101 0011 0100 0110 0110 0110
0110 0110 0110 0110 0110 0110 0110 0110

and then when you try to multiply it by 850, you get something very close to 65,535, but not exactly 65,535, because of the fact that 77.1 wasn't stored exactly because that would take infinite memory. And this number, which is very close to 65,535, happens to be one of only 12 possible floating point numbers which trigger this bug in Excel.

OK, Q&A.

Q: Isn't this really, really bad?

A: IMHO, no, the chance that you would see this in real life calculations is microscopic. Better worry about getting hit by a meterorite. Microsoft, of course, will be forced to tell everyone "accuracy is extremely important to us" and I'm sure they'll have a fix in a matter of days, and they'll be subjected to all kinds of well-deserved ridicule, but since I don't work there I'm free to tell you that the chance of this bug actually mattering to you as an individual is breathtakingly small.

Q: Shouldn't they be testing for these kinds of things?

A: I'll bet that most of the numeric testing done on the Excel team is done automatically with VBA code. Cells containing this value display as 100,000, but from VBA, they're going to look like 65,535 (since the number would be passed into the Basic runtime in binary, before the display formatting.) I'm sure there's plenty of code to test display formatting, but with a bug like this that only happens on 12 out of 18446744073709551616 possible floating point binary numbers, it's unlikely that any set of black-box tests would cover this case.

Q: What caused the bug?

A: I'm not sure exactly, since I don't have the code. Off the top of my head, I can't think of anything that would cause this behavior. Play around with Quanfei Wen's IEEE-754 calculator, maybe you'll find something.

Q: Why not use "exact" (decimal) arithmetic?

A: It's much slower than floating point arithmetic, since there's no hardware on your CPU chip to do it for you natively.

Over the years, Microsoft got so much heat for floating point rounding artifacts in the Windows Calculator that they rewrote it to use an arbitrary-precision arithmetic library. Since you have to poke at Windows Calculator with a stick, it doesn't have to be as fast as Excel. That said, CPUs have gotten pretty fast. I'll bet an arbitrary-precision version of Excel would perform pretty well these days. Still, the Microsoft Excel support team has spent the last 20 years defending IEEE 754, and it's not surprising that they've started to believe in it.

And let's face it -- do you really want the bright sparks who work there now, and manage to break lots of perfectly good working code -- rewriting the core calculating engine in Excel? Better keep them busy adding and removing dancing paper clips all day long.

http://www.joelonsoftware.com/items/2007/09/26b.html

Posted by: me at September 28, 2007 08:39 AM

"FFFF" is right. Thanks for the heads up.

Why do economists continue to insist that monopoly has no market effect until one company obtains 70-75% of a market? Long before there are any obvious price effects, there are other very obvious effects, such as those on quality. Or consider the Verizon/NARAL case, in which a telecommunications company starts deciding what can and cannot be transmitted over its wires.

Posted by: Charles at September 28, 2007 09:17 AM

I guess I will stick with Excel 97. It still works. If it's not broken, don't fix it.

Posted by: Robert Schwartz at September 28, 2007 09:55 AM

OpenOffice!

Posted by: JO at September 28, 2007 12:36 PM

I use Excel 2007 and do not have problem. Maybe the update is already out there

Posted by: Venkat Rao at September 28, 2007 01:05 PM

Excel has many, many bugs in regards its sloppy statistical functions. But the new version did basic arithmetic incorrectly when first released. So why on earth does Microsoft have any market capitalization given the clunky nature of its operating system and in the bugs in its Microsoft Office suite? Oh yea - something about monopolizing the market. I'm buying a Mac tomorrow!

Posted by: pgl at September 28, 2007 08:23 PM

Microsoft Word's readability statistics calculated the Flesch Grade Level incorrectly up until Office 2003 - essentially everything above 12.0 was reported as 12.0. This bug was there for over a decade. Considering some of the convoluted writing styles I have to read, this is probably not important to economists, but it's an example of MS not having to fix something in a product without serious competition.

Posted by: dug at September 28, 2007 10:06 PM

Openoffice, there you go."OpenOffice bug hits multiple operating systems
By Liam Tung
Special to CNET News.com
Published: September 25, 2007, 8:38 AM PDT

Security experts have discovered vulnerabilities in OpenOffice.org that could allow attackers to remotely execute code on Linux, Windows or Apple Mac-based computers."

http://www.news.com/OpenOffice-bug-hits-multiple-operating-systems/2100-1002_3-6209919.html

Posted by: me at September 29, 2007 07:39 AM

Thank you for that explication 'me'.

Posted by: calmo at September 29, 2007 11:36 PM

"...do you really want the bright sparks who work there now, and manage to break lots of perfectly good working code -- rewriting the core calculating engine in Excel?"

not really, I'd settle for them fixing the code that converts a floating point number to a string for display, a piece of code that one would have thought had been working for decades since sprintf has been around that long.

Posted by: supersaurus at September 30, 2007 08:06 AM

Not to get religious about it, but there doesn't seem to be a problem with it in the Mac version of Excel...

Posted by: peter at September 30, 2007 02:06 PM