Why Accidentally Quadratic?
I started this blog because over my career so far as a software engineer, I’ve kept personally running into software that was slow, and wondering why. And when I had time to point a profiler at them and debug the slowness, I would very often discover quadratic behavior, which could usually be drastically improved to linear, with just a little work. I kept running into these, and collecting them in a text file, and then finally I got around to writing them up for this blog.
So why quadratic? Why did I never stumble on accidental cubic behavior? A coworker, learning about this blog concept, remarked that “accidentally quadratic” is less exciting than, say, accidentally factorial. But I’ve never seen one of those in production software, either.
I think it comes down to two reasons, working in concert: It’s easy to write accidentally quadratic systems, and it’s easy to not notice that you’ve done so.
Why is it so easy to write quadratic code? Consider linear runtime: In many ways, “linear” is a default runtime for application code. “Linear” means you took some input and did something simple to each element. That’s a very broad description, and linear is the best you can ever hope for if you need to consider all your input. So linear code abounds.
The property of linearity, however, is not closed under composition. (Aside: being polynomial is, which is part of why P is such a useful complexity class). If you do a linear operation linearly many times, you get quadratic! And most programs and systems are built by composing components or functions into larger pieces. So it’s easy to have something linear where that’s fine and normal, but then someone else calls it on each input element, and … accidentally quadratic.
And having written a quadratic algorithm, it’s easy to not notice. The reason is partially that computers are fast, but equally important, I think, is that computer performance spans such a wide range. A single CPU cycle is <1ns on most devices these days. SSD or disk accesses range from 100us to 10ms. Network latencies go from ~1ms in the same DC, to ~200ms to go around the world.
With a range of 10⁹ between what might, in different contexts, be called single operations, program performance on small inputs is almost always dominated by constant factors, and it’s easy for an O(n²) to vanish in the noise. And unless you’re rigorously benchmarking an application, odds are you mostly test on smallish inputs, because those are easy to come by, and also faster to test!
And quadratic initially grows slowly enough that, if your program gets gradually slower as your dataset grows, it’s easy to write it off, and focus on the constant factors. You really need to jump to big cases and draw some graphs, or do some careful source code analysis, to spot the n².
My request to you, readers, is to do the analysis. Draw the graphs. Don’t take slow software for granted. Software doesn’t have to be slow, and you don’t have to live with slow software. The vast majority of the entries on the blog were found and/or fixed by users, not by the original developers. You can help. Profile your tools. Blog your results. File bugs. Be part of the solution.
