decoration decoration
Stories

GROKLAW
When you want to know more...
decoration
For layout only
Home
Archives
Site Map
Search
About Groklaw
Awards
Legal Research
Timelines
ApplevSamsung
ApplevSamsung p.2
ArchiveExplorer
Autozone
Bilski
Cases
Cast: Lawyers
Comes v. MS
Contracts/Documents
Courts
DRM
Gordon v MS
GPL
Grokdoc
HTML How To
IPI v RH
IV v. Google
Legal Docs
Lodsys
MS Litigations
MSvB&N
News Picks
Novell v. MS
Novell-MS Deal
ODF/OOXML
OOXML Appeals
OraclevGoogle
Patents
ProjectMonterey
Psystar
Quote Database
Red Hat v SCO
Salus Book
SCEA v Hotz
SCO Appeals
SCO Bankruptcy
SCO Financials
SCO Overview
SCO v IBM
SCO v Novell
SCO:Soup2Nuts
SCOsource
Sean Daly
Software Patents
Switch to Linux
Transcripts
Unix Books
Your contributions keep Groklaw going.
To donate to Groklaw 2.0:

Groklaw Gear

Click here to send an email to the editor of this weblog.


To read comments to this article, go here
How Much is the Linux Kernel Worth?
Wednesday, October 13 2004 @ 08:32 AM EDT

David Wheeler has just written an article in which he calculates the cost to re-develop the Linux 2.6 kernel. He figures about $612 million. That is the least it is worth however, as he notes:

"It's worth noting that these approaches only estimate development cost, not value. All proprietary developers invest in development with the presumption that the value of the resulting product (as captured from license fees, support fees, etc.) will exceed the development cost -- if not, they're out of business. Thus, since the Linux kernel is being actively sustained, it's only reasonable to presume that its value far exceeds this development estimate. In fact, the kernel's value probably well exceeds this estimate of simply redevelopment cost."

What is Linux's value, then? A lot. The word billions comes to mind. I enjoyed watching him do the calculations, and I hope you do too. My thanks to him for permission to share this with you on Groklaw.

****************************

Linux Kernel 2.6: It's Worth More!

David A. Wheeler
October 12, 2004

This paper refines Ingo Molnar's estimate of the development effort it would take to redevelop Linux kernel version 2.6. Molnar's rough estimate found it would cost $176M (US) to redevelop the Linux kernel using traditional proprietary approaches. By using a more detailed cost model and much more information about the Linux kernel, I found that the effort would be closer to $612M (US) to redevelop the Linux kernel. In either case, the Linux kernel is clearly worth far more than the $50,000 proposed by Jeff Merkey.

Introduction

On October 7, 2004, Jeff V. Merkey made the following offer on the linux.kernel mailing list:

We offer to kernel.org the sum of $50,000.00 US for a one time license to the Linux Kernel Source for a single snapshot of a single Linux version by release number. This offer must be accepted by **ALL** copyright holders and this snapshot will subsequently convert the GPL license into a BSD style license for the code.

Groklaw, for example, included an article that mentioned this proposal. It also noticed that someone with the same name is listed on a patent recently obtained by the Canopy Group. SCO is a Canopy Group company. Thus, this proposal raised suspicions in many as to Mr. Merkey's motivations.

Many respondents noted that Merkey's proposal would require complete agreement by all copyright holders. Not only would such a process be lengthy, but many copyright holders made it clear in various replies that they would not agree to any such plan. Many Linux kernel developers expect improved versions of their code to be continuously available to them, and a release using a BSD-style license would violate those developers' expectations. Indeed, it was clear that many respondants felt that such a move would strip the Linux kernel of legal protections against someone who wanted to monopolize a derived version of the kernel. Many open source software / Free software (OSS/FS) developers allow conversion of their OSS/FS programs to a proprietary program; some even encourage it. The BSD-style licenses are specifically designed to allow conversion of an OSS/FS program into a proprietary program. However, the GPL is the most popular OSS/FS license, and it was specifically designed to prevent this. Based on the thread responses, it's clear that many Linux kernel developers prefer that the GPL continue to be used as the Linux kernel license.

In one of the responses, Ingo Molnar calculated the cost to re-develop the Linux kernel using my tool SLOCCount. Molnar didn't specify exactly which version of the Linux kernel he used, but he did note that it was in the version 2.6 line, and presumably it was a recent version as of October 2004. He found that "the Linux 2.6 kernel, if developed from scratch as commercial software, takes at least this much effort under the default COCOMO model":

 Total Physical Source Lines of Code (SLOC)                = 4,287,449
 Development Effort Estimate, Person-Years (Person-Months) = 1,302.68 (15,632)
  (Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05))
 Schedule Estimate, Years (Months)                         = 8.17 (98.10)
  (Basic COCOMO model, Months = 2.5 * (person-months**0.38))
 Estimated Average Number of Developers (Effort/Schedule)  = 159.35
 Total Estimated Cost to Develop                           = $ 175,974,824
  (average salary = $56,286/year, overhead = 2.40).
 SLOCCount is Open Source Software/Free Software, licensed under the FSF GPL.
 Please credit this data as "generated using David A. Wheeler's 'SLOCCount'."

After noting the redevelopment cost of $176M (US), Ingo Molnar then commented, "and you want an unlimited license for $0.05M? What is this, the latest variant of the Nigerian/419 scam?"

Strictly speaking, the value of a product isn't the same as the cost of developing it. For example, if no one wants to use a software product, then it has no value, no matter how much was spent in developing it. The value of a proprietary software product to its vendor can be estimated by computing the amount of money that the vendor will receive from it over all future time (via sales, etc.), minus the costs (development, sustainment, etc.) over that same time period -- but predicting the future is extremely difficult, and the Linux kernel isn't a proprietary product anyway. Estimating value to users is difficult, and in fact, value estimation is surprisingly difficult to compute directly. But if a software product is used widely, so much so that you'd be willing to redevelop it, then development costs are a reasonable way to estimate the lower bound of its value. After all, if you're willing to redevelop a program, then it must have at least that value. The Linux kernel is widely used, so its redevelopment costs will at least give you a lower bound of its value.

Thus, Molnar's response is quite correct -- offering $50K for something that would cost at least $175M to redevelop is ludicrous. It's true that the kernel developers could continue to develop the Linux kernel after a BSD-style release, after all, the *BSD operating systems do this now. But with a BSD-style release, someone else could take the code and establish a competing proprietary product, and it would take time for the kernel developers to add enough additional material to compete with such a product. It's not clear that a proprietary vendor could really pick up the Linux kernel and maintain the same pace without many of the original developers, but that's a different matter. Certainly, the scale of the difference between $176M and $50K is enough to see that the offer is not very much compared to what the offerer is trying to buy.

But in fact, it's even sillier than it appears; I believe the cost to redevelop the Linux kernel would actually be much greater than this. Molnar correctly notes that he used the default Basic COCOMO model for cost estimation. This is the default cost model for SLOCCount, because it's a reasonable model for rough estimates about typical applications. It's also a reasonable default when you're examining a large set of software programs at once, since the ranges of real efforts should eventually average out (this is the approach I used in my More than a Gigabuck paper). So, what Molnar did was perfectly reasonable for getting a rough order of magnitude of effort.

But since there's only one program being considered in this analysis -- the Linux kernel -- we can use a more detailed model to get a more accurate cost estimate. I was curious what the answer would be. So I've estimated the effort to create the Linux kernel, using a more detailed cost model. This paper shows the results -- and it shows that redeveloping the Linux kernel would cost even more.

Computing a Better Estimate

To get better accuracy in our estimation, we need to use a more detailed estimation model. An obvious alternative, and the one I'll use, is the Intermediate COCOMO model. This model requires more information than the Basic COCOMO model, but it can produce higher-accuracy estimations if you can provide the data it needs. We'll also use the version of COCOMO that uses physical SLOC (since we don't have the logical SLOC counts). If you don't want to know the details, feel free to skip to the next section labelled "results".

First, we now need to determine if this is an "organic", "embedded", or "semidetached" application. The Linux kernel is clearly not an organic application; organic applications have a small software team developing software in a familiar, in-house environment, without significant communication overheads, and allow hard requirements to be negotiated away. It could be argued that the Linux kernel is embedded, since it often operates in tight constraints; but in practice these constraints aren't very tight, and the kernel project can often negotiate requirements to a limited extent (e.g., providing only partial support for a particular peripheral or motherboard if key documentation is lacking). While the Linux kernel developers don't ignore resource constraints, there are no specific constraints that the developers feel are strictly required. Thus, it appears that the kernel should be considered a "semidetached" system; this is the intermediate stage between organic and embedded. "Semidetached" isn't a very descriptive word, but that's the word used by the cost model so we'll use it here. It really just means between the two extremes of organic and embedded.

The intermediate COCOMO model also requires a number of additional parameters. Here are those parameters, and their values for the Linux kernel (as I perceive them); the parameter values are based on Software Engineering Economics by Barry Boehm:

  • RELY: Required software reliability: High (1.15). The Linux kernel is now used in situations where crashes can cause high financial loss. Even more importantly, Linux kernel developers expect the kernel to be highly reliable, and the kernel undergoes extensive worldwide off-nominal testing. While the testing approach is different than traditional testing regimes, it clearly produces a highly reliable result (see the Reliability section of my paper Why OSS/FS? Look at the Numbers!).
  • DATA: Data base size: Nominal (1.0). Typically the Linux kernel manages far larger data bases (file systems) than itself, but it handles them as somewhat opaque contents, so it's questionable that those larger sizes can really be counted as being much greater than nominal. Handling the filesystems' metadata is itself somewhat complicated, and does take significant effort, but filesystem management is only one of many things that the kernel does. So, absent more specific data, we'll claim it's nominal. If we claim it's higher, and there's reason for doing so, that would increase the estimated effort.
  • CPLX: Product complexity: Extra high (1.65). The kernel must perform multiple resource handling with dynamically changing priorities: multiple processes/tasks running on potentially multiple processors, with multiple kinds of memory, accessing peripherals which also have various dynamic priorities. The kerrnel must deal with device timing-dependent coding, and with highly coupled dynamic data structures (some of whose structure is imposed by hardware). In addition, it implements routines for interrupt servicing and masking, as well as multi-processor threading and load balancing. The kernel does have an internal design structure, which helps manage complexity somewhat, but in the end no design can eliminate the essential complexity of the task today's kernels are asked to perform. It's true that toy kernels aren't as complex; requiring single processors, forbidding re-entry, ignoring resource contention issues, ignoring error conditions, and a variety of other simplifications can make a kernel much easier to build, at the cost of poor performance. But the Linux kernel is no toy. Real-world operating system kernels are considered extremely difficult to develop, for a litany of good reasons.
  • TIME: Execution time constraint: High (1.11). Although it doesn't need to stay at less than 70% resource use, performance is an important design criteria, and much effort has been spent on measuring and improving performance.
  • STOR: Main storage constraint: Nominal (1.0). Although there has been some effort to limit memory use (e.g., 4K kernel stacks), Linux kernel development has not been strongly constrained by memory.
  • VIRT: Virtual machine volatility: High (1.15). The most common processor (x86) doesn't change that quickly, though new releases by Intel and AMD do need to be taken into account. But the other components of underlying machines (such as motherboards, peripheral and bus interfaces, etc.) change on a weekly basis. Often the documentation is unavailable, and when available, it's sometimes wrong. The Linux kernel developers spend a vast amount of time identifying hardware limitations/problems and working around them. What's worse, because of the variety of different hardware (and more which keeps arriving), the interface of the underlying machine is actually quite volatile.
  • TURN: Computer turnaround time: Nominal (1.0). Kernel recompilation and rebooting aren't interactive, but they're reasonably fast on 2+ GHz processors. Once the first compilation has occurred, recompliation is usually quite quick for localized changes. Thus, there's no reason for this to be a penalty.
  • ACAP: Analyst capability: High (0.86). It appears that the those analyzing the system, and determining what should be done in terms of identifying the "real" requirements and the needed design modifications to support them, are significantly better at doing things than the industry average.
  • AEXP: Applications experience: Nominal (1.0). It's difficult to determine how much experience with the Linux kernel the software developers of the Linux kernel have. Clearly, if you modify the same program day after day for many years, you'll tend to become more efficient at modifying it. Some developers, such as Linus Torvalds and Alan Cox, clearly have a vast amount of experience in modifying the Linux kernel. But for many other kernel developers it isn't clear that they have a vast amount of experience modifying the Linux kernel. In absence of better information, I've chosen nominal. This suggests that on average, developers of the Linux kernel have about 3 years' full-time experience in modifying the Linux kernel. More experience on average would help, and lower the effort estimation somewhat.
  • PCAP: Programmer capability: High (0.86). Generally only highly capable, above-average developers (75th percentile or more) would be successful at helping to develop a kernel.
  • VEXP: Virtual machine experience: Nominal (1.0). The x86 processors, which are by far the most popular for the Linux kernel, are relatively stable and developers have much experience with them. But they are not completely stable (e.g., the new 64-bit extensions for x86 and the NX bit). The Linux kernel is also influenced by other processor architectures, which in the aggregate change quite a bit over time. In addition, most of the kernel is in its drivers for hardware, and this hardware often acts as a virtual machine as well as a needed interface. Many driver developers, while experienced in general, often have less experience with that particular component, and they often don't have good documentation to help them. What's worse, hardware components are notorious for not operating as their specifications proclaim, and the kernel's job is to hide all that. Thus, this is averaged as nominal, and this is probably being generous.
  • LEXP: Programming language experience: High (0.95).
  • MODP: Modern programming practices: High - in general use (0.91). This program is written in C, which lacks structures such as exception handling, so there is extensive use of "goto" (etc.) to implement error handling. However, the use of such constructs tends to be highly stylized and structured, so credit is given for using modern practices. Some might claim that this is giving too much credit, but changing this would only make the estimated effort even larger.
  • TOOL: Use of software tools: Nominal (1.0).
  • SCED: Required development schedule: Nominal (1.0). There is little schedule pressure per se, so the "most natural" speed is followed.
Results

So now we can compute a new estimate for how much effort it would take to re-develop the Linux kernel 2.6:

MM-nominal-semidetached = 3*(KSLOC)^1.12 =
  = 3* (4287.449)^1.12 = 35,090 MM
Effort-adjustment =  1.15 * 1.0 * 1.65 * 1.11 * 1.0 * 1.15 *
    1.0 * 0.86 * 1.0 * 0.86 * 1.0 * 0.95 * 0.91 * 1.0 * 1.0
    = 1.54869
MM-adjusted = 35,090 * 1.54869 = 54,343.6 Man-Months
            = 4,528.6 Man-years of effort to (re)develop
If average salary = $56,286/year, and overhead = 2.40, then:
Development cost = 56286*2.4*4528.6 = $611,757,037

In short, it would actually cost about $612 million (US) to re-develop the Linux kernel.

Why is this estimate so much larger than Molnar's original estimate? The answer is that SLOCCount presumes that it's dealing with an "average" piece of software (i.e., a typical application) unless it's given parameters that tell it otherwise. This is usually a reasonable default; almost nothing is as hard to develop as an operating system kernel. But operating system kernels are so much harder to develop that, if you include that difficulty into the calculation, the effort estimations go way up. This difficulty shows up in the nominal equation - semidetached is fundamentally harder, and thus has a larger exponent in its estimation equation than the default for basic COCOMO. This difficulty also shows up in factors such as "complexity"; the task the kernel does is fundamentally hard. The strong capabilities of analysts and developers, use of modern practices, and programming language experience all help, but they can only partly compensate; it's still very hard to develop a modern operating system kernel.

This difference is smoothed over in my paper More than a Gigabuck because that paper includes a large number of applications. Some of the applications would cost less than was estimated, while others would cost more; in general you'd expect that by computing the costs over many programs the differences would be averaged out. Providing that sort of information for every program would have been too time-consuming for the limited time I had available to write that paper, and I often didn't have that much information anyway. If I do such a study again, I might treat the kernel specially, since the kernel's size and complexity makes it reasonable to treat specially. SLOCCount actually has options that allow you to provide the parameters for more accurate estimates, if you have the information they need and you're willing to take the time to provide them. Since the nominal factor is 3, the adjustment for this situation is 1.54869, and the exponent for semidetached projects is 1.12, just providing SLOCCount with the option "--effort 4.646 1.12" would have created a more accurate estimate. But as you can see, it takes much more work to use this more detailed estimation model, which is why many people don't do it. For many situations, a rough estimate is really all you need; Molnar certainly didn't need a more exact estimate to make his point. And being able to give a rough estimate when given little information is quite useful.

In the end, Ingo Molnar's response is still exactly correct. Offering $50K for something that would cost would millions to redevelop, and is actively used and supported, is absurd.

It's interesting to note that there are already several kernels with BSD licenses: the *BSDs (particularly FreeBSD, OpenBSD, and NetBSD). These are fine operating systems for many purposes, indeed, my website currently runs on OpenBSD. But clearly, if there is a monetary offer to buy Linux code, the Linux kernel developers must be doing something right. Certainly, from a market share perspective, Linux-based systems are far more popular than BSD-based systems. If you just want a kernel licensed under a BSD-style license, you know where to find them.*

It's worth noting that these approaches only estimate development cost, not value. All proprietary developers invest in development with the presumption that the value of the resulting product (as captured from license fees, support fees, etc.) will exceed the development cost -- if not, they're out of business. Thus, since the Linux kernel is being actively sustained, it's only reasonable to presume that its value far exceeds this development estimate. In fact, the kernel's value probably well exceeds this estimate of simply redevelopment cost.

It's also worth noting that the Linux kernel has grown substantially. That's not surprising, given the explosion in the number of peripherals and situations that it supports. In Estimating Linux's size, I used a Linux distribution released in March 2000, and found that the Linux kernel had 1,526,722 physical source lines of code. In More than a Gigabuck, the Linux distribution had been released on April 2001, and its its kernel (version 2.4.2) was 2,437,470 physical source lines of code. At that point, this Linux distribution would have cost more than $1 Billion (a Gigabuck) to redevelop. The much newer and larger Linux kernel considered here, with far more drivers and capabilities than the one in that paper, now has 4,287,449 physical source lines of code, and is starting to approach a Gigabuck of effort all by itself. And that's just the kernel. There are other components that weren't included More than a Gigabuck (such as OpenOffice.org) that are now common in Linux distributions, which are also large and represent massive investments of effort. More than a Gigabuck noted the massive rise in size and scale of OSS/FS systems, and that distributions were rapidly growing in invested effort; this brief analysis is evidence that the trend continues.

In short, the amount of effort that today's OSS/FS programs represent is rather amazing. Carl Sagan's phrase "billions and billions," which he applied to astronomical objects, easily applies to the effort (measured in U.S. dollars) now invested in OSS/FS programs.

A Postscript

I'd like to thank Ingo Molnar for doing the original analysis (using SLOCCount) that triggered this paper. Indeed, I'm always delighted to see people doing analysis instead of just guesswork. Thanks for doing the analysis! This paper is not in any way an attack on Molnar's work; Molnar computed a quick estimate, and this paper simply uses more data to refine his effort estimation further.

Feel free to see my home page at http://www.dwheeler.com. You may also want to look at my paper More than a Gigabuck: Estimating GNU/Linux's Size, my article Why OSS/FS? Look at the Numbers!, and my papers and book on how to develop secure programs.


Copyright 2004 David A. Wheeler. All rights reserved.


  View Printable Version


Groklaw © Copyright 2003-2013 Pamela Jones.
All trademarks and copyrights on this page are owned by their respective owners.
Comments are owned by the individual posters.

PJ's articles are licensed under a Creative Commons License. ( Details )