Coupling and the Maintainability of the Linux Kernel
~ by Dr Stupid
A recently
presented paper has the following abstract, something that would
certainly gain the attention of anyone interested in Linux kernel
development:
Categorization of Common Coupling and
Its Application to the Maintainability of the Linux Kernel
Data coupling between modules, especially common coupling, has long
been considered a source of concern in software design, but the issue
is somewhat more complicated for products that are comprised of kernel
modules together with optional nonkernel modules. This paper presents a
refined categorization of common coupling based on definitions and uses
between kernel and nonkernel modules and applies the categorization to
a case study.
Common coupling is usually avoided when possible because of the
potential for introducing risky dependencies among software modules.
The relative risk of these dependencies is strongly related to the
specific definition-use relationships. In a previous paper, we
presented results from a longitudinal analysis of multiple versions of
the open-source operating system Linux. This paper applies the new
common coupling categorization to version 2.4.20 of Linux, counting the
number of instances of common coupling between each of the 26 kernel
modules and all the other nonkernel modules. We also categorize each
coupling in terms of the definition-use relationships. Results show
that the Linux kernel contains a large number of common couplings of
all types, raising a concern about the long-term maintainability of
Linux.
To anyone with a knowledge of software engineering terminology, whether
gained through formal education or from the University of Life, the first
90% of the abstract is uneventful; this, though, serves to maximize the
impact of the final sentence. A "concern about the long-term
maintainability of Linux," no less. Mr A. Linux Kernel went to the effort
of writing that reports
of his destruction had been exaggerated, but now we
find in this paper rumours are circulating of a life-threatening illness.
The full paper is only available to subscribers (note however that one of the authors makes a copy available on his personal website here [PSF]), but
we were fortunate to be able to discuss the paper with Andrew Morton,
one of the lead kernel developers, in two contexts: first, in a
general discussion about coupling and kernel maintainability, and then,
after he had read the complete paper, in specific terms related to the thoughts expressed by the authors. As you will see, despite the worries expressed in the paper, the Linux kernel is alive and well.The researchers, in designing a theoretical model to evaluate the
coupling of Linux, have of necessity made certain assumptions to reduce
complexity and make the problem amenable to a mathematical,
quantitative approach. However, this can lead to inaccurate results:
you may recall the possibly apocryphal
tale of the mathematical
demonstration that bumblebees can't fly. (As an aside, there is
also a parallel here with studies showing operating system X to be
"more secure" than operating system Y, when on closer inspection the
definition of "more secure" is a narrow and potentially misleading, but
easy to calculate, statistic figure.)
What is coupling?
"Coupling", a term which uses a visual metaphor of mechanical parts
coupled together by a driveshaft, is used widely in software
engineering to describe a link between two parts of a system that is not part of an abstracted
interface. We make this distinction because the parts of a system have
to be linked in some way -- otherwise there would be no system. For the
benefit of Groklaw's less technical readers, I'll try to explain the
concept in non-software terms (kernel developers may skip the next few
paragraphs.)
Imagine that the steering wheel of a car was like the steering wheel
one can buy for playing computer driving games -- that is to say, it
merely generated an electrical signal that said "a little bit left,"
"hard to the right," etc. and that this signal was passed to a device
under the bonnet that turned the front wheels. You could replace the
steering wheel with a similarly wired joystick, or anything that
generated an appropriate electrical signal, and you could still drive
the car. We would call this an abstracted
interface. The communication between the two parts (the steering
wheel and the mechanism that turns the front wheels) has been reduced
to its conceptual essence of "I want to go left" and "I want to go
right."
In a typical car, though (especially one without power steering) the
steering wheel is directly and mechanically linked to the front wheels.
You could not easily replace the steering wheel with a joystick,
because the whole mechanism depends on the wheel being turned left and right. But not
only is the interface less abstracted, but it is also highly coupled. You can feel bumps
and vibrations coming back up from the wheels on the road. In other
words, the coupled interface means that what happens to one part of the
mechanism (going over a rock) has a knock-on effect on the other
(giving you a pain in the wrists) that wasn't necessarily desired.
Going back to software terms, we would describe modules A and B as
coupled if, to operate properly, A relies on B's internal workings to be a certain
way, and vice versa. Just as a traditional steering wheel is sensitive
to holes in the road, A becomes sensitive to changes inside B. That
introduces a risk that when a bug is fixed in B, it may cause an
unexpected problem in A. It is this "knock-on effect" result of
coupling that makes software engineers -- especially when talking
theoretically -- nervous of coupling. They invent approaches like "Model
View Controller" to discipline themselves against thoughtless coupling.
However, I hope that the above example also shows you the other side of
the coin. The high-tech electronic steering wheel was less coupled, but
more complex. There are more elements to go wrong, and a fault may be
harder to find. Also, some drivers would like to "feel the road" via
the steering wheel, and to give this feedback in the electronic system
would require more complex circuitry still. Sometimes, the costs of
eliminating coupling in a system outweigh the gains.
Back to the kernel
The paper focused on data coupling; roughly speaking, this is where two
or more software parts all make direct use of the same area of computer
memory. This can lead to situations where a particular part can have
data changed "behind its back," as it were. The developer has to bear
this in mind when writing the code, which isn't always easy.
We asked Frank Sorenson to read the paper and here is his comment:
Too many dependencies between modules
can obviously be viewed as a bad thing. However, no
coupling/dependencies leads to multiple copies of the same thing, which
is obviously more difficult to maintain. For example, the Linux
kernel contains a library of common functions that may be used in the
various modules. A month or so ago, someone realized that 6
different modules all implemented a 'sort' function, all with the same
interface to the module. This brought about a push to standardize
them, and a single 'sort' function was put into the common function
library.
We've already mentioned that the costs of decoupling aren't always
justified -- this is a case in point. In this instance, increasing the
use of common code -- while increasing the coupling -- reduced the
maintenance requirements.
Frank continues:
The article was submitted in July
2003. That's quite a while ago in Linux-kernel-time. A lot
has changed since then, and 2.6.x is (in my
opinion) more maintainable due to being well-engineered from the
beginning. Do the authors have results for the 2.6.x
kernel? How does
the use of global variables change from 2.4.x to 2.6.x?
The kernel maintainers have pushed to make sure that the interface to
kernel functions remains the same. For example, it would not be
acceptable to change the way a common function behaves:
copy_value(source, destination) should not ever change to
copy_value(destination, source) (unless all references are fixed)
Linux modules are generally organized in an hierarchical fashion.
This makes it much harder for a change in one area to affect other
modules or portions of the kernel.
Obviously, what the authors discuss is a very real danger (not
specifically to Linux, but to any sufficiently large project -- such as
Longhorn!). The authors don't offer many valid suggestions on how
to combat the problem. The fact that Linux is open allows them to do
the research, however; the closed nature of Windows prevents people
from seeing how Microsoft has addressed this problem (if at all.)
If Linux is too tightly coupled, how about Windows? Having your entire
user interface dependent on a web browser -- now that's coupling!
My personal opinion is that the 2.6 is much tidier and more organised
than 2.4, which in turn was tidier than 2.2, etc. The direction of the
Linux kernel is towards a cleaner, less coupled architecture -- there is
an active, ongoing, continuous effort to preserve maintainability.
Indeed, patches are frequently rejected purely on the grounds they harm
maintainability and have to be re-engineered accordingly.
Andrew Morton's comments
However, you probably didn't read this far to hear Frank and I
discussing the kernel, when we have Andrew Morton available. Here's his
initial comment on the abstract: They examined a kernel (2.4.20) which
is unchanged in this regard from 2.4.15. We've done three and a
half years of development since then!
That being said, I wouldn't be surprised if their analysis showed that
linux-2.6.11 also has a lot of coupling, even though we have done a lot
of improvement work in that and other areas.
But that's OK -- we often do this on purpose, because, although we are
careful about internal interfaces, the kernel is optimized for speed,
and when it comes to trading off speed against maintenance cost, we
will opt for speed. This is because the kernel has a truly
massive amount of development and testing resources. We use it.
More philosophically, I wouldn't find such a study to be directly
useful, really. It represents an attempt to predict the
maintenance cost of a piece of software. But that's not a
predictor of the quality!
If you find that the maintenance cost is high, and the quality is also
high, then you've just discovered that the product has had a large
amount of development resources poured into it. And that is
so. And it is increasing.
If someone wants to use this study to say that "Linux is likely to be
buggy" then I'd say "OK, so show me the bugs". If they're using
it to say "Linux kernel maintenance uses a lot of resources" then I'd
say "Sure. Don't you wish you had such resources?".
Note that I'm not necessarily agreeing with the study. If they
looked at the kernel core then sure, there's a lot of coupling.
But that's a
relatively small amount of code. If they were looking mainly at
filesystem drivers and device drivers (the bulk of the kernel) then I'd
say that
the study is incorrect -- the interfaces into drivers is fairly lean,
and is getting leaner.
Andrew then went on to read the paper in detail. His subsequent
comments were rather different:
AAAARRRGGGGHHH! . . .The only thing they've done is look at the use of global variables
and they've assumed that using a global variable is a "bad" coupling.
And look at the naughty global variables which we've used:
jiffies: This is a variable which counts clock ticks. Of course it's global. Unless
they know of a universe in which time advances at more
than one speed at a time.
[Dr S: System time has to be global because time is
a universal throughout the system.We don't usually worry about Einstein
in software development :) ]
And they fail to note that if we did want to "modularize" jiffies,
we'd make a change in a single file:
#define jiffies
some_function_which_returns_jiffies()
Other examples such as system_utsname, init_task, panic_timeout,
stop_a_enabled, xtime and `current' are all by definition singleton objects.
'current' is especially bogus -- this refers to the task
structure for the currently-running task. It's not a global
variable at all, really. If
this is bad, then using the variable 'this' in C++ is also
bad.
Geeze. Who reviewed this?
Theory vs Practice
Of course, one can engage in armchair debate endlessly; ultimately,
what is needed is some empirical data against which a model or theory
can be tested. Coupling, like cholesterol, comes in "good" and "bad"
forms. The good form enables a system to work at peak performance,
without introducing excessive maintenance costs. The bad form results
in a system that is increasingly fragile and hard to scale. Which of
these in practice has been uppermost in Linux kernel development?
This kernel
mailing list thread from 2002 -- discussing a kernel of
similar vintage to that covered by the study -- is of interest. Several
people expressed a worry that the kernel would never effectively scale
beyond 4 CPUs -- and coupling was one of the issues:
[2-CPU SMP] makes fine sense for any
tightly coupled system, where the tight
coupling is cost-efficient.
Three years later, have "long-term maintainability" issues in the
Linux kernel held it back? Here's
what Novell said last July [PDF] on the topic:
"More than 128 CPUs have been tested on
available hardware, but theoretically, there is no limit on the number
that will work."
This bumblebee continues to fly.
|