David Wheeler has just written an article in which he calculates the cost to re-develop the Linux 2.6 kernel. He figures about $612 million. That is the least it is worth however, as he notes:
"It's worth noting that these approaches only estimate development cost, not value. All proprietary developers invest in development with the presumption that the value of the resulting product (as captured from license fees, support fees, etc.) will exceed the development cost -- if not, they're out of business. Thus, since the Linux kernel is being actively sustained, it's only reasonable to presume that its value far exceeds this development estimate. In fact, the kernel's value probably well exceeds this estimate of simply redevelopment cost."
What is Linux's value, then? A lot. The word billions comes to mind. I enjoyed watching him do the calculations, and I hope you do too. My thanks to him for permission to share this with you on Groklaw.
Linux Kernel 2.6: It's Worth More!
David A. Wheeler
October 12, 2004
This paper refines Ingo Molnar's estimate of the development effort
it would take to redevelop Linux kernel version 2.6.
Molnar's rough estimate found it would cost $176M (US) to
redevelop the Linux kernel using traditional proprietary approaches.
By using a more detailed cost model and much more information about the
Linux kernel, I found that the effort would be
closer to $612M (US)
to redevelop the Linux kernel.
In either case, the Linux kernel is clearly worth far more than the $50,000
proposed by Jeff Merkey.
On October 7, 2004, Jeff V. Merkey made the
following offer on the linux.kernel mailing list:
We offer to kernel.org the sum of $50,000.00 US for a one time
license to the Linux Kernel Source for a single snapshot of
a single Linux version by release number. This offer must be
accepted by **ALL** copyright holders and this snapshot will
subsequently convert the GPL license into a BSD style license
for the code.
Groklaw, for example, included an article that mentioned this proposal. It also noticed that someone with the same name is listed on a patent recently obtained by the Canopy Group. SCO is a Canopy Group company. Thus, this proposal raised suspicions in many as to Mr. Merkey's motivations.
Many respondents noted that Merkey's proposal
would require complete agreement by all copyright holders.
Not only would such a process be lengthy, but
many copyright holders made it clear in various replies
that they would not agree to any such plan.
Many Linux kernel
developers expect improved versions of their code to be continuously
available to them, and a release using a BSD-style license would
violate those developers' expectations.
Indeed, it was clear that many respondants felt that such a move
would strip the Linux kernel of legal protections
against someone who wanted to monopolize a derived version of the kernel.
Many open source software / Free software (OSS/FS)
developers allow conversion of their OSS/FS programs
to a proprietary program; some even encourage it.
The BSD-style licenses are specifically designed to allow conversion
of an OSS/FS program into a proprietary program.
the GPL is the
most popular OSS/FS license, and it was specifically designed
to prevent this.
Based on the thread responses, it's clear that
many Linux kernel developers prefer that the GPL continue to be used as
the Linux kernel license.
In one of the responses,
Ingo Molnar calculated the cost to re-develop the Linux kernel
using my tool
Molnar didn't specify exactly which version of the Linux kernel he used,
but he did note that it was in the version 2.6 line, and
presumably it was a recent version as of October 2004.
He found that "the Linux 2.6 kernel, if developed from scratch
as commercial software, takes at least this much effort under the
default COCOMO model":
Total Physical Source Lines of Code (SLOC) = 4,287,449
Development Effort Estimate, Person-Years (Person-Months) = 1,302.68 (15,632)
(Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05))
Schedule Estimate, Years (Months) = 8.17 (98.10)
(Basic COCOMO model, Months = 2.5 * (person-months**0.38))
Estimated Average Number of Developers (Effort/Schedule) = 159.35
Total Estimated Cost to Develop = $ 175,974,824
(average salary = $56,286/year, overhead = 2.40).
SLOCCount is Open Source Software/Free Software, licensed under the FSF GPL.
Please credit this data as "generated using David A. Wheeler's 'SLOCCount'."
After noting the redevelopment cost of $176M (US),
Ingo Molnar then commented,
"and you want an unlimited license for $0.05M? What is this, the latest
variant of the Nigerian/419 scam?"
the value of a product isn't the same as the cost of developing it.
if no one wants to use a software product, then it has no value, no matter
how much was spent in developing it.
The value of a proprietary software product to its vendor
can be estimated by
computing the amount of money that the vendor will receive from it over all
future time (via sales, etc.),
minus the costs (development, sustainment, etc.)
over that same time period -- but predicting
the future is extremely difficult, and the Linux kernel isn't a
proprietary product anyway.
Estimating value to users is difficult, and in fact,
value estimation is surprisingly difficult to compute directly.
But if a software product is used widely,
so much so that you'd be willing to
redevelop it, then development costs are a reasonable way to estimate
the lower bound of its value.
if you're willing to redevelop a program, then it must have at least
The Linux kernel is widely used, so its redevelopment costs
will at least give you a lower bound of its value.
Thus, Molnar's response is quite correct -- offering $50K for something
that would cost at least $175M to redevelop is ludicrous.
It's true that the kernel developers could continue to develop the
Linux kernel after a BSD-style release, after all, the *BSD operating systems
do this now.
But with a BSD-style release, someone else could take the code
and establish a competing proprietary product, and it would
take time for the kernel developers to add enough additional material
to compete with such a product.
It's not clear that a proprietary vendor could really pick up the Linux
kernel and maintain the same pace without many of the original developers,
but that's a different matter.
Certainly, the scale of the difference between $176M and $50K is enough
to see that the offer is not very much compared to what the offerer
is trying to buy.
But in fact, it's even sillier than it appears; I believe the cost to
redevelop the Linux kernel would actually be much greater than this.
Molnar correctly notes that he used the default Basic COCOMO model
for cost estimation.
This is the default cost model for SLOCCount, because it's
a reasonable model for rough estimates about typical applications.
It's also a reasonable default when
you're examining a large set of software programs at once, since the ranges of
real efforts should eventually average out (this is the approach I used in my
More than a Gigabuck paper).
So, what Molnar did was perfectly reasonable for getting a rough
order of magnitude of effort.
But since there's only one program being considered in this analysis --
the Linux kernel --
we can use a more detailed model to get a more accurate cost estimate.
I was curious what the answer would be.
So I've estimated the effort to create the Linux kernel, using a
more detailed cost model.
This paper shows the results -- and it shows that redeveloping the
Linux kernel would cost even more.
Computing a Better Estimate
To get better accuracy in our estimation,
we need to use a more detailed estimation model.
An obvious alternative, and the one I'll use, is
the Intermediate COCOMO model.
This model requires more information than the Basic COCOMO model,
but it can produce higher-accuracy estimations if you can provide
the data it needs.
We'll also use the version of COCOMO that uses physical SLOC
(since we don't have the logical SLOC counts).
If you don't want to know the details, feel free to skip to the next
section labelled "results".
First, we now need to determine if this is an "organic", "embedded", or
The Linux kernel is clearly not an organic application; organic applications
have a small software team developing software in a familiar,
in-house environment, without significant communication overheads,
and allow hard requirements to be negotiated away.
It could be argued that the Linux kernel is embedded, since it often
operates in tight constraints; but in practice
these constraints aren't very tight,
and the kernel project can often negotiate requirements to a limited extent
(e.g., providing only partial support for a particular peripheral
or motherboard if key documentation is lacking).
While the Linux kernel developers don't ignore resource constraints,
there are no specific constraints that the developers feel are
Thus, it appears that the kernel should be considered
a "semidetached" system; this is the
intermediate stage between organic and embedded.
"Semidetached" isn't a very descriptive word, but that's the word used by
the cost model so we'll use it here.
It really just means between the two extremes of organic and embedded.
The intermediate COCOMO model also requires a number of additional parameters.
Here are those parameters, and their values for the Linux kernel
(as I perceive them); the parameter values are based on
Software Engineering Economics by Barry Boehm:
- RELY: Required software reliability: High (1.15). The Linux kernel
is now used in situations where crashes can cause high financial loss.
Even more importantly, Linux
kernel developers expect the kernel to be highly reliable,
and the kernel undergoes extensive worldwide off-nominal testing.
While the testing approach is different than traditional testing regimes,
it clearly produces a highly reliable result
section of my paper
Why OSS/FS? Look at the Numbers!).
- DATA: Data base size: Nominal (1.0). Typically the Linux kernel
manages far larger data bases (file systems) than itself, but it
handles them as somewhat opaque contents, so it's questionable that
those larger sizes can really be counted as being much greater than
nominal. Handling the filesystems'
metadata is itself somewhat complicated, and does take significant
effort, but filesystem management is only one of many things that
the kernel does. So, absent more specific data, we'll
claim it's nominal. If we claim it's higher, and there's reason
for doing so, that would increase the estimated effort.
- CPLX: Product complexity: Extra high (1.65).
The kernel must perform multiple resource handling with dynamically
changing priorities: multiple processes/tasks running on potentially
multiple processors, with multiple kinds of memory, accessing peripherals
which also have various dynamic priorities.
The kerrnel must deal with device timing-dependent coding, and
with highly coupled dynamic data structures (some of whose structure
is imposed by hardware). In addition, it implements
routines for interrupt servicing and masking, as well as multi-processor
threading and load balancing.
The kernel does have an internal design structure, which helps manage
complexity somewhat, but in the end no design can eliminate the
essential complexity of the task today's kernels are asked to perform.
It's true that toy kernels aren't as complex; requiring single
processors, forbidding re-entry, ignoring resource contention issues,
ignoring error conditions, and a variety of other simplifications
can make a kernel much easier to build, at the cost of poor performance.
But the Linux kernel is no toy.
Real-world operating system kernels are considered extremely difficult
to develop, for a litany of good reasons.
- TIME: Execution time constraint: High (1.11). Although it doesn't need to
stay at less than 70% resource use, performance is an important
design criteria, and much effort has been spent on measuring and
- STOR: Main storage constraint: Nominal (1.0). Although there has been
some effort to limit memory use (e.g., 4K kernel stacks), Linux kernel
development has not been strongly constrained by memory.
- VIRT: Virtual machine volatility: High (1.15).
The most common processor (x86) doesn't change that quickly, though new
releases by Intel and AMD do need to be taken into account.
But the other components of underlying machines
(such as motherboards, peripheral and bus interfaces, etc.)
change on a weekly basis. Often the documentation is unavailable,
and when available, it's sometimes wrong.
The Linux kernel developers spend a vast amount of time identifying
hardware limitations/problems and working around them.
What's worse, because of the variety of different hardware (and more
which keeps arriving), the interface of the underlying machine is
actually quite volatile.
- TURN: Computer turnaround time: Nominal (1.0). Kernel recompilation
and rebooting aren't interactive, but they're reasonably fast on
2+ GHz processors. Once the first compilation has occurred,
recompliation is usually quite quick for localized changes.
Thus, there's no reason for this to be a penalty.
- ACAP: Analyst capability: High (0.86). It appears that the
those analyzing the system, and determining what should be done in
terms of identifying the "real" requirements and the
needed design modifications to support them,
are significantly better at doing things than the industry average.
- AEXP: Applications experience: Nominal (1.0). It's difficult to
determine how much experience with the Linux kernel
the software developers of the Linux kernel have.
Clearly, if you modify the same program day after day for many years,
you'll tend to become more efficient at modifying it.
Some developers, such as Linus Torvalds and Alan Cox,
clearly have a vast amount of experience in modifying the Linux kernel.
But for many other kernel developers it isn't clear that they have
a vast amount of experience modifying the Linux kernel.
In absence of better information, I've chosen nominal. This suggests that
on average, developers of the Linux kernel have about 3 years' full-time
experience in modifying the Linux kernel.
More experience on average would help, and lower the effort
- PCAP: Programmer capability: High (0.86). Generally only
highly capable, above-average developers (75th percentile or more)
would be successful at helping to develop a kernel.
- VEXP: Virtual machine experience: Nominal (1.0). The x86 processors,
which are by far the most popular for the Linux kernel, are
relatively stable and developers have much experience with them.
But they are not completely stable (e.g., the new 64-bit extensions
for x86 and the NX bit).
The Linux kernel is also influenced by other processor architectures,
which in the aggregate change quite a bit over time.
In addition, most of the kernel is in its drivers for hardware, and this
hardware often acts as a virtual machine as well as a needed interface.
Many driver developers, while experienced in general,
often have less experience with that particular
component, and they often don't have good documentation to help them.
What's worse, hardware components are notorious for not operating as their
specifications proclaim, and the kernel's job is to hide all that.
Thus, this is averaged as nominal, and this is probably being generous.
- LEXP: Programming language experience: High (0.95).
- MODP: Modern programming practices: High - in general use (0.91).
This program is written in C, which lacks structures such as
exception handling, so there is extensive use of "goto" (etc.) to implement
error handling. However, the use of such constructs tends to be
highly stylized and structured, so credit is given for using modern
practices. Some might claim that this is
giving too much credit, but changing this would only make the estimated
effort even larger.
- TOOL: Use of software tools: Nominal (1.0).
- SCED: Required development schedule: Nominal (1.0). There is little
schedule pressure per se, so the "most natural" speed is followed.
So now we can compute a new estimate for how much effort it
would take to re-develop the Linux kernel 2.6:
MM-nominal-semidetached = 3*(KSLOC)^1.12 =
= 3* (4287.449)^1.12 = 35,090 MM
Effort-adjustment = 1.15 * 1.0 * 1.65 * 1.11 * 1.0 * 1.15 *
1.0 * 0.86 * 1.0 * 0.86 * 1.0 * 0.95 * 0.91 * 1.0 * 1.0
MM-adjusted = 35,090 * 1.54869 = 54,343.6 Man-Months
= 4,528.6 Man-years of effort to (re)develop
If average salary = $56,286/year, and overhead = 2.40, then:
Development cost = 56286*2.4*4528.6 = $611,757,037
In short, it would actually cost about $612 million (US) to re-develop the
Why is this estimate so much larger than Molnar's original estimate?
The answer is that SLOCCount presumes that it's dealing with an
"average" piece of software (i.e., a typical application) unless
it's given parameters that tell it otherwise.
This is usually a reasonable default; almost nothing is as hard
to develop as an operating system kernel.
But operating system kernels
are so much harder to develop that, if you include that difficulty
into the calculation, the effort estimations go way up.
This difficulty shows up in the nominal equation -
semidetached is fundamentally harder, and thus has a larger exponent
in its estimation equation than the default for basic COCOMO.
This difficulty also shows up in factors such as "complexity";
the task the kernel does is fundamentally hard.
The strong capabilities of analysts and developers, use of modern practices,
and programming language experience all help,
but they can only partly compensate; it's still very hard to
develop a modern operating system kernel.
This difference is smoothed over in my paper
More than a Gigabuck because that paper
includes a large number of applications.
Some of the applications would cost less than was estimated, while
others would cost more; in general you'd expect that by computing the
costs over many programs the differences would be averaged out.
Providing that sort of information for every program would have been
too time-consuming for the limited time I had available to write that paper,
and I often didn't have that much information anyway.
If I do such a study again, I might treat the kernel specially, since
the kernel's size and complexity makes it reasonable to treat specially.
SLOCCount actually has options that allow you to provide the
parameters for more accurate estimates,
if you have the information they need and you're willing
to take the time to provide them.
Since the nominal factor is 3, the adjustment for this situation
is 1.54869, and the exponent for semidetached projects is 1.12,
just providing SLOCCount with
the option "--effort 4.646 1.12"
would have created a more accurate estimate.
But as you can see, it takes much more work to use this more
detailed estimation model, which is why many people don't do it.
For many situations, a rough estimate is really all you need;
Molnar certainly didn't need a more exact estimate to make his point.
And being able to give a rough estimate when given
little information is quite useful.
In the end, Ingo Molnar's response is still exactly correct.
Offering $50K for something
that would cost would millions to redevelop, and is actively used and
supported, is absurd.
It's interesting to note that there are already
several kernels with BSD licenses: the *BSDs (particularly
FreeBSD, OpenBSD, and NetBSD).
These are fine operating systems for many purposes,
indeed, my website currently runs on OpenBSD.
But clearly, if there is a monetary offer to buy Linux code,
the Linux kernel developers must be doing something right.
Certainly, from a market share perspective, Linux-based systems are far
more popular than BSD-based systems.
If you just want a kernel licensed under a BSD-style license,
you know where to find them.*
It's worth noting that these approaches only estimate development cost,
All proprietary developers invest in development with the presumption
that the value of the resulting product (as captured from license fees,
support fees, etc.) will exceed the development cost -- if not, they're
out of business.
Thus, since the Linux kernel is being actively sustained, it's only
reasonable to presume that its value far exceeds this development
In fact, the kernel's value probably well exceeds this estimate of
simply redevelopment cost.
It's also worth noting that the Linux kernel has grown substantially.
That's not surprising, given the explosion in the number of peripherals
and situations that it supports.
Estimating Linux's size,
I used a Linux distribution released in March 2000,
and found that the Linux kernel had 1,526,722 physical source lines of code.
More than a Gigabuck,
the Linux distribution had been released on April 2001, and its
its kernel (version 2.4.2) was 2,437,470 physical source lines of code.
At that point, this Linux distribution would have cost more
than $1 Billion (a Gigabuck) to redevelop.
The much newer and larger Linux kernel considered here, with far more
drivers and capabilities than the one in that paper,
now has 4,287,449 physical source lines of code, and
is starting to approach a Gigabuck of effort all by itself.
And that's just the kernel.
There are other components that weren't included More than a Gigabuck
(such as OpenOffice.org) that are now common in Linux distributions,
which are also large and represent massive investments of effort.
More than a Gigabuck
noted the massive rise in size and scale
of OSS/FS systems, and that distributions were rapidly growing in
invested effort; this brief analysis is evidence that the trend continues.
In short, the amount of effort that today's OSS/FS programs represent
is rather amazing.
Carl Sagan's phrase "billions and billions," which he applied to
astronomical objects, easily applies to the effort
(measured in U.S. dollars) now invested in OSS/FS programs.
I'd like to thank Ingo Molnar for doing the original analysis
(using SLOCCount) that triggered this paper.
Indeed, I'm always delighted to see people doing analysis instead of
Thanks for doing the analysis!
This paper is not in any way an attack on Molnar's work; Molnar computed
a quick estimate, and this paper simply uses more data to refine his
effort estimation further.
Feel free to see my home page at
You may also want to look at my paper
More than a Gigabuck: Estimating
Why OSS/FS? Look at
the Numbers!, and my papers and book on
how to develop
© Copyright 2004 David A. Wheeler. All rights reserved.