Here is UNIX Heritage Society's Warren Toomey's second article on the ABI files, as promised. The first, as you recall looked at errno.h. Now, it's time, he writes, "to turn our attention to SCO's assertion that signal.h was one of the files involved in the "line-for-line copying of UNIX System V code" which SCO alleges "improperly appears in Linux''.
To determine if their accusation is well-founded or not, we need to understand what signal.h is, what's in it, and a bit of its history.
It's important
to point out that there are two versions of signal.h in most versions
of UNIX ( /usr/include/signal.h and /usr/include/sys/signal.h), and as
yet -- to the best of our collective knowlege -- SCO Group has
not specified which, if either, is the file they claim has
been improperly copied. The same is true of errno.h.
We have yet to see SCO list any "UNIX Derived Files" publicly, for that matter. The files SCO mentions in their Revised Supplemental Responses to IBM's 1st and 2nd Set of Interrogatories are all from AIX, Dynix and Linux, although on page 59 it references an Exhibit A that SCO says lists them. However, Exhibit A is not attached to the publicly available Revised Supplemental Responses, at least not yet. SCO has referenced UNIX files being attached to letters sent to their "dear Unix licensees" ("A complete listing of the UNIX Derived Files is attached"), but so far we have not heard of anyone actually getting the attachment with the letter. In Red Hat's most recent
filing, they include the letter to Lehman Brothers, which also references the attachment,
but again, there is no such attachment in the public court filing. Has anyone who got a letter from SCO received this attachment listing "UNIX Derived Files"?
**************************************************
Signal.h
~ by Warren Toomey
Introduction
Following on from my report into errno.h in Linux, it's time
to turn our attention to SCO's assertion
that signal.h was one of the files involved in the "line-for-line
copying of UNIX System V code [which] improperly appears in Linux''
and that "persons as yet unknown copied these files into Linux,
erasing the USL copyright attribution in the process''.
In Unix and Unix-like systems, the underlying operating system can
send a message to a running program to inform it of some exceptional
event: a signal. The program's execution is diverted to a signal
handler which deals with the event, before returning the program
to what it was originally doing.
The sort of events that can occur are numerous: access to an 'out
of bounds' area of memory, a divide by zero operation, a signal to
stop executing from the user, etc. For (nearly) each signal type on
the system, a running program can decide to ignore the signal, catch
the signal and deal with it, or simply let the default Unix behaviour
happen for that signal type. Most signals if uncaught result in the
program being terminated, and the SIGKILL signal can never be caught:
it is the "terminate with extreme prejudice'' signal in Unix.
To have a valid assertion that "line-for-line copying of UNIX
System V code . . . improperly appears in Linux'' for signal.h,
SCO needs to demonstrate that the signal names, their numeric values,
any associated program comments and other function definitions could
only have been directly copied from System V to Linux, and from nowhere
else. Our job here is to track down the origins of signal.h
in Linux.
What's in Signal.h?
What's in a typical signal.h file on most Unix or Unix-like
systems? First of all, there is a set of defined signal names, their
values, and possibly a C comment describing the signal. Systems which
comply with the POSIX
standard need to define about 28 signal names and associated numeric
values; the values are not defined by the POSIX standard, but nearly
every Unix and Unix-like system uses the same numbering scheme.
The earliest version of the signal name/numbering scheme still in
existence is the nsys/param.h
file from the 3rd Edition of UNIX in August 1973, with 12 defined
signals. As Unix grew, so too did the number of signals, and by the
7th Edition of UNIX
and the 32V distribution in 1979, the file now called signal.h
had 15 signals.
By the end of the 1970s, there were already Unix clones like Idris
and Coherent, and of course they also had to enumerate the set of
signals. Not surprisingly, they followed the same numbering convention
as Unix, as is shown by this file from Idris
in 1978, where nearly all of the names and numbers are derived from
6th Edition UNIX.
This sort of code "cloning'' is exactly the thing that seems
to make SCO see red. However, at the time AT&T asked Dennis Ritchie
(one of the developers of Unix) to visit Coherent's makers
[first link] and determine if the Mark Williams Company relied
on Unix code when they wrote Coherent, Dennis determined that he "couldn't
find anything that was copied'', and "what they generated was
[...] reproducible from the [Unix] manual''. It must be remembered
that the manual pages for Unix were published and publicly available;
in fact, each new version of Unix was known by the edition of the
printed manuals.
Dennis goes on to indicate that AT&T "backed off, possibly after
other thinking and investigation [... and] so far as I know, after
that MWC and Coherent were free to offer their system and allow it
to succeed or fail in the market''. This decision and others like
it, together with the publicly available enumeration of the signal
values, allowed the Unix signal numbers to be used in many Unix clones
and non-Unix systems such as:
The list is probably endless; hyperlinks to other examples of the
Unix numbering in non-Unix systems can be posted as replies to this
article.[1]
We've digressed from the topic of "What's in signal.h?''
to observing that the contents of the original Unix file was copied
with AT&T's knowledge as early as 1978. Let's get back to what is
in a typical signal.h file.
Along with the list of signal types, there is a list of operations
that a running program can do when a signal arrives. Typically:
- SIG_IGN (usually 0): ignore the signal
- SIG_DFL (usually 1): use the default system behaviour, and
- have the program handle the signal.
There is no numeric definition for the program handling the signal
itself. Instead, signal.h defines a prototype for the signal()
function. This system function takes two arguments: the signal number
to catch, and the name of a program-specified function that will catch
it. This program-specific function must receive an integer (the number
of the signal that has arrived) but not return any value. These days,
example definitions of the program-specific function and the signal()
function might look like:
typedef void __sighandler_t __P((int));
sig_t signal(int sig, sig_t func);
Earlier versions of signal.h often rolled both definitions
into one line, giving an unreadable definition like:
void (* signal(int sig, void (*func)(int)))(int);
The behaviour of signals and their handlers in Unix has changed dramatically
over time, and now the whole signal system is mind-bogglingly complex.
The POSIX
standard lists many, many more type definitions and function definitions
that must be found in modern signal.h files.
Signal.h in Linux 0.01
Linus Torvalds released version 0.01 of the Linux kernel source around
the "middle of [19]91'',
and this includes the kernel file linux/include/signal.h.
We have:
- the usual #ifndef _SIGNAL_H, #define _SIGNAL_H ... #endif /*
_SIGNAL_H */ combination to stop this file from being loaded into
the compiler more than once;
- an #include to bring in definitions of common C types;
- definitions of two new C types:
typedef int sig_atomic_t;
typedef unsigned int sigset_t; /* 32 bits */
- the list of 22 signal names and values, plus a definition that NSIG
(the number of signals) equals 32;
- definitions for SIG_IGN and SIG_DFL:
#define SIG_DFL ((void (*)(int))0) /* default signal handling */
#define SIG_IGN ((void (*)(int))1) /* ignore signal */
- a definition of a structure called sigaction, which is used
to set a handler for a specific signal:
struct sigaction {
void (*sa_handler)(int);
sigset_t sa_mask;
int sa_flags;
};
- the definition of the names and values that the sa_flags
field can take:
#define SIG_BLOCK 0 /* for blocking signals */
#define SIG_UNBLOCK 1 /* for unblocking signals */
#define SIG_SETMASK 2 /* for setting the signal mask */
- finally, the definition of a bunch of C library functions that perform
signal-related operations:
void (*signal(int _sig, void (*_func)(int)))(int);
int raise(int sig);
int kill(pid_t pid, int sig);
int sigaddset(sigset_t *mask, int signo);
int sigdelset(sigset_t *mask, int signo);
int sigemptyset(sigset_t *mask);
int sigfillset(sigset_t *mask);
int sigismember(sigset_t *mask, int signo); /* 1 - is, 0 - not, -1 error
*/
int sigpending(sigset_t *set);
int sigprocmask(int how, sigset_t *set, sigset_t *oldset);
int sigsuspend(sigset_t *sigmask);
int sigaction(int sig, struct sigaction *act, struct sigaction *oldact);
Linux 0.01 vs Minix 1.5.10
If you're still awake at this point, then you are doing well. What
sources of information did Linus use when he wrote this file? We saw
that with errno.h, the most likely source of information
was Minix 1.5. The evidence below suggests that Minix 1.5.10's
signal.h was the source of inspiration for Linux 0.01 signal.h:
- the same protective #ifndef ..., #define ..., #endif around the
file;
- the same definition of sig_atomic_t and nearly the same
definition of sigset_t, except that the latter is promoted
to 32 bits in size with a comment on this promotion in Linux 0.01;
- the same definition of 22 signal names and numbers;
- the same definition of SIG_DFL and SIG_IGN; and
- the same definition of the sigaction structure.
There are some differences though. The Minix 1.5.10 file defines the
signal functions differently to Linux 0.01; in particular, the parameter
names are different (_set vs. set, _oset
becomes oldset etc.). The parameter names are really for
decoration here, and serve no purpose to the compiler, so perhaps
Linus was not so keen on the Minix parameter names.
One important difference is the different definitions of the signal()
function:
void (*signal()) (); in Minix 1.5.10
void (*signal(int _sig, void (*_func)(int)))(int);
in Linux 0.01
One possible clue here is Linus' comment in the file that he is "trying
to keep headers POSIX''. The POSIX
standard defines the signal() function thus:
void (*signal(int, void (*)(int)))(int);
and Linus has followed the POSIX standard and also decorated his definition
with parameter names.
Linux 0.01 vs System V R4
Let's now compare the Linux 0.01 signal.h file to the corresponding
file /usr/include/sys/signal.h from the 1990 version of System
V R4.0 for i386:
- the same protective #ifndef ..., #define ..., #endif around the
file, although the macro used is _SYS_SIGNAL_H not _SIGNAL_H;
- no #include'd files;
- sigset_t is defined as a structure containing an array of 4 unsigned
longs called sigbits;
- sig_atomic_t is not defined here, but it is defined in the /usr/ucbinclude/sys/signal.h
file: obviously AT&T got it from the BSD distributions;
- there are 31 signals numbered 1 to 31, with some different names to
Linux 0.01:
| Number | Linux | System V | |
| | 7 | SIGUNUSED | SIGEMT |
| 10 | SIGUSR1 | SIGBUS |
| 12 | SIGUSR2 | SIGSYS |
| 16 | SIGSTKFLT | SIGUSR1 |
| 17 | SIGCHLD | SIGUSR2 |
| 18 | SIGCONT | SIGCHLD |
| 19 | SIGSTOP | SIGPWR |
| 20 | SIGTSTP | SIGWINCH |
| 21 | SIGTTIN | SIGURG |
| 22 | SIGTTOU | SIGIO |
- SIG_IGN and SIG_DFL are defined as per Linux but with the outside
parentheses missing and no comments;
- sigaction has an extra field: int sa_resv[2];
- SIG_BLOCK, SIG_UNBLOCK, SIG_SETMASK: same values, no comments;
- most of the C functions defined in Linux are defined in System V as
C macros:
#define sigmask(n) ((unsigned long)1 sigbits[0])
#define sigktou(ks,us) ((us)->sigbits[0] = *(ks),
(us)->sigbits[1] = 0,
(us)->sigbits[2] = 0,
(us)->sigbits[3] = 0)
#endif /* !defined(_POSIX_SOURCE) */
I think it's pretty obvious that Linus did not have access
to nor use System V source code to generate his 0.01 signal.h
file.
Since Linux 0.01, the signal.h file has changed and expanded
somewhat, but even the signal.h file from the Linux 2.4.22 distribution
still bears little resemblance to the System V signal.h file;
even a cursory inspection shows that the Minix 1.5.10 signal numbers
are still used here.
Postscript: errno.h Proliferates
At the beginning I mentioned that, as early as 1978, the signal names
and values from AT&T's original signal.h file had been used
in other systems. The same is true for errno.h. Here is an
example list that I put together in about 30 minutes of searching on Google:
- pe7sys.h
from the port of C-Kermit to the Idris system on the Perkin-Elmer
7000. Copyright attribution to Whitesmiths Ltd. in 1978.
- errno.h
from the Microsoft Quick C compiler. Copyright attribution to Microsoft
Corporation.
- tclErrno.h
from the Tcl source. This has copyright attributions to the Regents
of the University of California and Sun Microsystems, Inc.
- need_errno.h
from a package called RandomDan. This has copyright attributions to
Microsoft Corporation and may have been derived from a Microsoft C
compiler.
- errno.h
from FreeDOS. Copyright attribution to Borland International.
- errno.h
from the LIVSIX package. Copyright attribution to Motorola and others.
- arch.h
from the lwIP package. Copyright attribution to the Swedish Institute
of Computer Science.
AT&T did not put copyright notices on the "ABI files'' from
3rd Edition UNIX in 1973 up to and including the first release of
System V in 1983. It makes you wonder, if Whitesmiths were putting
copyright notices on their files in 1978, who really can claim
copyright on the content of these files?
[1] These links are
merely to demonstrate that the signal names and numbers have been used
elsewhere. Copyright notices in the linked files should be observed. Copyrighted materials may not be used without the permission of the author.
|