May I please pick your brains? The Open Source as Prior Art project is looking to build a dictionary (they call it a
thesaurus) of software terms to use in creating a taxonomy for use in its
electronic source code publication system. A description of the purpose of the project and the publication
process can be found
Rather than build the thesaurus from scratch, OSAPA is looking for examples
of other collections of software terms. It seems everyone thought the US Patent Office had such a list, but it turns out they don't, which might just explain why they seem to have so much trouble finding prior art.
Do you know of any such collection? If you know of any, you can post it here as a comment and I'll collect it all and send it along or you can post it directly
to the osapa.org wiki or send it to the osapa.org mailing list at
Thank you for any help you can provide. I noticed in researching that there was just an international workshop this week on mining software repositories, and I've written to the folks that sponsored that conference, hoping someone there might know. Then I realized some of you right here on Groklaw would know, if anyone would, where to find any such collections, if they exist. So I offered to ask you.
To explain further, here's the email message posted to the osapa.org mailing list that caught my eye:
Date: Tue, 23 May 2006 12:51:10 -0700
From: "Diane Peters"
Subject: [priorart-discuss] Software Thesaurus
To: "'OSS and USPTO prior art discussions'"
As some of you may recall, coming out of our February meetings in
D.C. we were hoping to receive from the USPTO a thesaurus of types that (we
understood) was accessed by patent examiners when searching for prior
art in the software patent field. Our plan had been to post the
thesaurus and encourage developers in the community to annotate and
otherwise add to its terms. The thesaurus could then serve as a
common reference point for communications between the USPTO and the
community. It could used by developers to describe code when
electronically published; it could also be used by the USPTO to locate
that code. Such a thesaurus might have other interesting and valuable
uses as well.
It now appears that the USPTO does not have a thesaurus for the
various software technologies, so we will need to locate another or
start building a thesaurus/library/glossary ourselves.
The USPTO has offered this paper
http://www.netlib.org/utk/papers/dig-lib/main.html as a suggested
starting point, particularly the projects discussed in the first
section following the intro, and the indexing discussed thereafter.
There's also mention of IEEE standards for library data models.
We will add this job under to the wiki under the "longer jobs"
section of "Things that can be done right now." If you know of other useful
starting points, pls feel free to add to the wiki in that same
location. Once we have a sense of what's been done and what may be
useful, we can either select one as a starting point or break out
what's useful from each and build from there.
Diane M. Peters, General Counsel
Open Source Development Labs, Inc.
I took a look at the paper the USPTO suggested looking at. The Netlib is a collection of mathematical software, from all I can tell. Their search page suggests using the GAMS class hierarchy or the freeWAIS-sf query syntax. I have no idea what that is, but I'm just telling you what I'm finding. The
NHSE page indicates that the project ran out of funding in 2004, and it mentions something called Repository in a Box, a toolkit developed in 1996:
From 1994 - 2004, NHSE existed as a distributed collection of software, documents, data, and information of interest to the high performance and parallel computing community. The significance of the collaborative effort is evident through the many useful reports and tools generated as well as the many repositories that have been created, and are still being created, with the Repository in a Box (RIB) toolkit developed in 1996. However, continued operation of the site without funding has become impractical. Therefore, the site has been taken down.
The NHSE meta-repository, which consists of metadata describing software applications and tools from the PTLib, HPC-Netlib and BenchWeb repositories combined, is still available. However, since PTLib and HPC-Netlib are no longer maintained, the metadata from those repositories are frozen in time. Only the BenchWeb content is still maintained.
The Netlib collection of mathematical software and other tools is still maintained and we recommend you visit that repository. Links to the archived NHSE repositories mentioned above can be found on that site as well.
On behalf of all of the federal agencies and institutions that helped make NHSE possible, we would like to thank all of the contributors over the years who submitted tools, links, applications, and other useful and usable material to the collection.
NHSE Technical Team
nhse AT cs.utk.edu
So that is where I am in my research, and if you have other suggestions, I'd be very interested.