decoration decoration
Stories

GROKLAW
When you want to know more...
decoration
For layout only
Home
Archives
Site Map
Search
About Groklaw
Awards
Legal Research
Timelines
ApplevSamsung
ApplevSamsung p.2
ArchiveExplorer
Autozone
Bilski
Cases
Cast: Lawyers
Comes v. MS
Contracts/Documents
Courts
DRM
Gordon v MS
GPL
Grokdoc
HTML How To
IPI v RH
IV v. Google
Legal Docs
Lodsys
MS Litigations
MSvB&N
News Picks
Novell v. MS
Novell-MS Deal
ODF/OOXML
OOXML Appeals
OraclevGoogle
Patents
ProjectMonterey
Psystar
Quote Database
Red Hat v SCO
Salus Book
SCEA v Hotz
SCO Appeals
SCO Bankruptcy
SCO Financials
SCO Overview
SCO v IBM
SCO v Novell
SCO:Soup2Nuts
SCOsource
Sean Daly
Software Patents
Switch to Linux
Transcripts
Unix Books
Your contributions keep Groklaw going.
To donate to Groklaw 2.0:

Groklaw Gear

Click here to send an email to the editor of this weblog.


Contact PJ

Click here to email PJ. You won't find me on Facebook Donate Paypal


User Functions

Username:

Password:

Don't have an account yet? Sign up as a New User

No Legal Advice

The information on Groklaw is not intended to constitute legal advice. While Mark is a lawyer and he has asked other lawyers and law students to contribute articles, all of these articles are offered to help educate, not to provide specific legal advice. They are not your lawyers.

Here's Groklaw's comments policy.


What's New

STORIES
No new stories

COMMENTS last 48 hrs
No new comments


Sponsors

Hosting:
hosted by ibiblio

On servers donated to ibiblio by AMD.

Webmaster
May I Please Pick Your Brains? Request for Info on Software Terms
Friday, May 26 2006 @ 03:58 AM EDT

May I please pick your brains? The Open Source as Prior Art project is looking to build a dictionary (they call it a thesaurus) of software terms to use in creating a taxonomy for use in its electronic source code publication system. A description of the purpose of the project and the publication process can be found here.

Rather than build the thesaurus from scratch, OSAPA is looking for examples of other collections of software terms. It seems everyone thought the US Patent Office had such a list, but it turns out they don't, which might just explain why they seem to have so much trouble finding prior art.

Do you know of any such collection? If you know of any, you can post it here as a comment and I'll collect it all and send it along or you can post it directly to the osapa.org wiki or send it to the osapa.org mailing list at http://lists.osdl.org/mailman/listinfo/priorart-discuss.

Thank you for any help you can provide. I noticed in researching that there was just an international workshop this week on mining software repositories, and I've written to the folks that sponsored that conference, hoping someone there might know. Then I realized some of you right here on Groklaw would know, if anyone would, where to find any such collections, if they exist. So I offered to ask you.

To explain further, here's the email message posted to the osapa.org mailing list that caught my eye:

Date: Tue, 23 May 2006 12:51:10 -0700
From: "Diane Peters"
Subject: [priorart-discuss] Software Thesaurus
To: "'OSS and USPTO prior art discussions'"

Message-ID:
Content-Type: text/plain;
charset="us-ascii"

Hi everyone,

As some of you may recall, coming out of our February meetings in D.C. we were hoping to receive from the USPTO a thesaurus of types that (we understood) was accessed by patent examiners when searching for prior art in the software patent field. Our plan had been to post the thesaurus and encourage developers in the community to annotate and otherwise add to its terms. The thesaurus could then serve as a common reference point for communications between the USPTO and the community. It could used by developers to describe code when electronically published; it could also be used by the USPTO to locate that code. Such a thesaurus might have other interesting and valuable uses as well.

It now appears that the USPTO does not have a thesaurus for the various software technologies, so we will need to locate another or start building a thesaurus/library/glossary ourselves.

The USPTO has offered this paper http://www.netlib.org/utk/papers/dig-lib/main.html as a suggested starting point, particularly the projects discussed in the first section following the intro, and the indexing discussed thereafter. There's also mention of IEEE standards for library data models.

We will add this job under to the wiki under the "longer jobs" section of "Things that can be done right now." If you know of other useful starting points, pls feel free to add to the wiki in that same location. Once we have a sense of what's been done and what may be useful, we can either select one as a starting point or break out what's useful from each and build from there.

Diane M. Peters, General Counsel
Open Source Development Labs, Inc.

I took a look at the paper the USPTO suggested looking at. The Netlib is a collection of mathematical software, from all I can tell. Their search page suggests using the GAMS class hierarchy or the freeWAIS-sf query syntax. I have no idea what that is, but I'm just telling you what I'm finding. The NHSE page indicates that the project ran out of funding in 2004, and it mentions something called Repository in a Box, a toolkit developed in 1996:

From 1994 - 2004, NHSE existed as a distributed collection of software, documents, data, and information of interest to the high performance and parallel computing community. The significance of the collaborative effort is evident through the many useful reports and tools generated as well as the many repositories that have been created, and are still being created, with the Repository in a Box (RIB) toolkit developed in 1996. However, continued operation of the site without funding has become impractical. Therefore, the site has been taken down.

The NHSE meta-repository, which consists of metadata describing software applications and tools from the PTLib, HPC-Netlib and BenchWeb repositories combined, is still available. However, since PTLib and HPC-Netlib are no longer maintained, the metadata from those repositories are frozen in time. Only the BenchWeb content is still maintained.

The Netlib collection of mathematical software and other tools is still maintained and we recommend you visit that repository. Links to the archived NHSE repositories mentioned above can be found on that site as well.

On behalf of all of the federal agencies and institutions that helped make NHSE possible, we would like to thank all of the contributors over the years who submitted tools, links, applications, and other useful and usable material to the collection.

Many thanks.

NHSE Technical Team
nhse AT cs.utk.edu

So that is where I am in my research, and if you have other suggestions, I'd be very interested.


  


May I Please Pick Your Brains? Request for Info on Software Terms | 110 comments | Create New Account
Comments belong to whoever posts them. Please notify us of inappropriate comments.
Off-topic comments here
Authored by: Naich on Friday, May 26 2006 @ 04:05 AM EDT
This is the non-anonymous off topic thread. Please do not use anonymous
threads.

[ Reply to This | # ]

Put corrections here
Authored by: Naich on Friday, May 26 2006 @ 04:07 AM EDT
Thank you.

[ Reply to This | # ]

Jargon file?
Authored by: rsmith on Friday, May 26 2006 @ 04:26 AM EDT

A start might be the jargon file (framed version). The jargon file homepage also has other interesting parts. One of my favorites being the Story of Mel.

---
Intellectual Property is an oxymoron.

[ Reply to This | # ]

The Free On-line Dictionary of Computing
Authored by: Anonymous on Friday, May 26 2006 @ 04:47 AM EDT
How about "The Free On-line Dictionary of Computing", http://www.foldoc.org/

[ Reply to This | # ]

May I Please Pick Your Brains? Request for Info on Software Terms
Authored by: Magpie on Friday, May 26 2006 @ 04:53 AM EDT

What about this? Seems to be supported by Imperial College (London University)

http://foldoc.org/

[ Reply to This | # ]

Wikipedia
Authored by: Anonymous on Friday, May 26 2006 @ 05:20 AM EDT
I use Wikipedia a lot.

It also uses hyperlinks wherever possible! Which means it ignore that
obnoxiously silly idea of ontologies...

Wikipedia also use the ODL "Open Document License", the literary
equivalent of GPL. There is a lso a thesaurus collection within the Wikipedia.

[ Reply to This | # ]

Taxonomy vs. Dictionary
Authored by: Anonymous on Friday, May 26 2006 @ 05:29 AM EDT
A taxonomy is different from a dictionary, though a dictionary might be useful source material. A taxonomy says things like "to describe this concept, always use this word", and "the meaning of word A is entirely contained in that of word B".

For a good example, see UK IPSV. It doesn't have nearly enough detail in the computing area, but it might be a good place to start hanging more detail off.

[ Reply to This | # ]

NHSE is no longer, RIB still exists, but ACM taxonomy may be more useful
Authored by: leopardi on Friday, May 26 2006 @ 05:55 AM EDT
See the RIB home page. RIB uses IEEE Standard 1420.1, Basic Interoperability Data Model (BIDM), and adds NHSE extensions. Unfortunately, the data model is just that, and is neither a taxonomy nor a glossary.

The ACM Taxonomy or the ACM Computing Classification System may be more useful.

[ Reply to This | # ]

Wikipedia
Authored by: pogson on Friday, May 26 2006 @ 06:15 AM EDT
The world built Wikipedia.org and it contains almost any term in which we might be interested concerning prio art.

As a test, I searched for

  • superheterodyne
  • RAID
  • windows
  • iefbr14
  • FOCAL
  • quicksor t
  • logarithmic and
  • feedback
and obtained useful hits. If you find something not in Wikipedia, create an article or modify the appropriate article. An important feature of Wikipedia is that the USPTO or anyone may clone the database and/or add prior art as it is discovered.

---
http://www.skyweb.ca/~alicia/ , my homepage, an eclectic survey of topics: berries, mushrooms, teaching in N. Canada, Linux, firearms and hunting...

[ Reply to This | # ]

Perhaps this W3C Document Object Model could be of use
Authored by: Sean DALY on Friday, May 26 2006 @ 06:31 AM EDT
See http://www.w3.org/TR/ DOM-Level-2-Core/glossary.html

[ Reply to This | # ]

May I Please Pick Your Brains? Request for Info on Software Terms
Authored by: gbl on Friday, May 26 2006 @ 07:04 AM EDT
Eric Raymond has experience of building dictionaries. Perhaps someone could approach him to help?

---
If you love some code, set it free.

[ Reply to This | # ]

May I Please Pick Your Brains? Request for Info on Software Terms
Authored by: gvc on Friday, May 26 2006 @ 07:40 AM EDT
WAIS is the "Wide Area Information Service" pioneered by Brewster Kahle. It was started before the Web, using a network of search engines for information retrieval requests. freeWAIS is a free implementation of the system, and the "query syntax" is the particular protocol used to communicate between WAIS clients and servers.

Kahle went on to found the Internet Archive including the Wayback machine that has been mentioned here.

[ Reply to This | # ]

IETF RFC
Authored by: Chaosd on Friday, May 26 2006 @ 07:47 AM EDT
The RFC Database might hold some good material. There is also a good general reference site at Zytrax - not specifically a dictionary, but quite a lot of web/PC related discussion.

---
-----
No question is stupid || All questions are stupid

[ Reply to This | # ]

GAMS
Authored by: gvc on Friday, May 26 2006 @ 07:48 AM EDT
GAMS is "Guide to Available Mathematical Software." See NIST's description.

"Class hierarchy" is jargon from the Object Oriented Programming (C++ and friends) bandwagon. All it means in this case is a taxonomy that you can search -- mathematical software has been organized by GAMS into groups, subgroups, etc. much as biologists have organized life on this planet.

Like WAIS, GAMS has a search service, and the document is referring to the particular protocol used to navigate and search within this taxonomy.

[ Reply to This | # ]

Unix Man pages
Authored by: Chaosd on Friday, May 26 2006 @ 07:59 AM EDT

Almost forgot, the Unix man (and info) pages. Can be easily searched using the apropos and whatis commands, and includes the following (unix related) sections:

  1. Executable programs or shell commands
  2. System calls (functions provided by the kernel)
  3. Library calls (functions within program libraries)
  4. Special files (usually found in /dev)
  5. File formats and conventions eg /etc/passwd
  6. Games
  7. Miscellaneous (including macro packages and conventions), e.g. man(7), groff(7)
  8. System administration commands (usually only for root)
  9. Kernel routines [Non standard]

Most importantly the man pages usually contain Author, Copyright, History and 'see also' sections.

---
-----
No question is stupid || All questions are stupid

[ Reply to This | # ]

How do I Contact You?
Authored by: gvc on Friday, May 26 2006 @ 08:10 AM EDT
Pamela,

I am an information retrieval researcher and there might be an interesting
project here. How do I contact you? (Or you can contact me if you like.) I
have a vague recollection that it is possible to send you email, but I don't see
any such link. Perhaps I'm being blind.

thanks,

gvc

[ Reply to This | # ]

Google Scholar/Full-text Information Retrieval with Citation Indexing
Authored by: rdc3 on Friday, May 26 2006 @ 08:33 AM EDT

I would suggest that the OSDL Prior Art project contact Google with respect to this task. Google makes its business out of "organizing the world's information."

Classification of prior art publications using taxonomies or thesauri is becoming increasingly irrelevant in the context of powerful full-text information retrieval techniques with citation indexing. Google and Google Scholar are my preferred tools. Google Scholar has the advantage of helping you move forward to find newer publications that cite prior works.

The quality of links between documents is the determining factor in citation searching. Google, CiteSeer and other systems could be greatly improved by standardized forms of linking.

Software systems link to each other through APIs, which may frequently be formalized using standard data formats and protocols. In classifying software systems to aid future search, one of the best things that could be done is to document the specific standards (RFCs, W3C specs, ISO documents, etc.) and APIs that are used by the software. Indeed, a set of standard identifiers for these items could be considered a controlled vocabulary. However, it is a controlled vocabulary that naturally grows over time, as new protocols, formats and technology bases become widely used.

Beyond linking to the formal specs, it would also be to link to academic publications addressing any fundamental techniques used (e.g., novel data structures or algorithms). Of course, references to patents also make sense, where they are known. Again, a set of standardized identifiers for publications makes a natural controlled vocabulary that grows over time.

[ Reply to This | # ]

IBM Dictionary of Computing
Authored by: DL on Friday, May 26 2006 @ 08:50 AM EDT
ISBN 0070314888

It's over 700 pages, so it's pretty comprehensive.

It's old-school IBM-centric, but considering the long history and number of
patents IBM has, it should be a worthwhile reference. It is useful for
decryping IBMisms into other IBMisms.

I don't know that IBM would be willing to contribute from it for this project.
As far as I know, I hasn't been updated in a while.

The Jargon file already mentioned is comprehensive, but the tone is quite
irreverent. Fair warning: some of the entries in this one are not suitable for
posting on this site, however accurate they may be.

---
DL

[ Reply to This | # ]

Encyclopedia of Computer Science and Engineering
Authored by: lisch on Friday, May 26 2006 @ 09:26 AM EDT
The Encyclopedia of Computer Science and Engineering (ISBN 0442276796) is a classic standard reference. The latest edition is a bit pricey for most people's day-to-day use, but this massive tome would probably help OSPA.

[ Reply to This | # ]

Dictionary of Computing
Authored by: DaveJakeman on Friday, May 26 2006 @ 09:32 AM EDT
ISBN 0-7221-6595-1

First published by Oxford University Press, 1983. Later published by Sphere
Books Ltd, 30-32 Gray's Inn Road, London WC1X 8JL.

From the back page:

"The Dictionary of Computing is the essential reference for all those
professionally involved in computing, both in academic and industrial life. It
is also suitable for people who have had no previous contact with computers but
now find they need specific reliable information, or for those with personal
computers who want to find out more about the subject.

"This dictionary contains over 3,750 terms used in computing. Terms which
range in complexity from basic ideas and equipment to graduate-level computer
science. Where relevant, entries are supplemented by instructive diagrams and
tables.

"The entries have been written by practitoners in all branches of computing
under the scrutiny of distinguished scholars from both sides of the
Atlantic."

---
Champagne for my real friends, real pain for my sham friends - Francis Bacon
---
Should one hear an accusation, try it out on the accuser.

[ Reply to This | # ]

netlib
Authored by: jesse on Friday, May 26 2006 @ 09:40 AM EDT
I took a look at the paper the USPTO suggested looking at. The Netlib is a collection of mathematical software, from all I can tell. Their search page suggests using the GAMS class hierarchy or the freeWAIS-sf query syntax. I have no idea what that is, but I'm just telling you what I'm finding.

netlib is a holdover from before the web existed. In those days it used E-mail to provide queries for data, and E-mailed replies for the results.

The "FreeWAIS-sf" query was a modified (ie - without license restrictions) WAIS (Wide Area Information Search) with scientific format (I think that was the "-sf"). The query itself did not use the current key+key type syntax, but depended on each key located in the document, with adjacency computation (ie - number of significant words between the query key words) to determine relative scoring. The results were then sorted and E-mailed back to the requestor.

The definition of "significant words" was usually any word NOT in a dictionary of ignored words. The database was a keyed file by word, document, starting location in the document, and ending location in the document, along with an initial weight value generated during indexing. I don't know what the weighting was based on, possibly things like frequency the word appeard, number of words between occurance...

This was all done before Yahoo was a gleam in somebodies eye. The major searches were done by "Veronica" which was based on gopher and the non-free WAIS engine.

Now gopher - was a pre-web browser that used a basic TCP connection (similar to telnet) for the browser. The user would use the gopher client and specify the target server (a complete imitation of telent at the command line). The server would then present the root level document which was basically a table of contents + a header and footer. The user then used the arrow keys to step down the page and select the content desired (I belive a tab would jump to the next key entry - basically a URL). The gopher server would supply the data just like a web server does, one data file, then disconnect. It was up to the gopher client to provide the interface for reading the file, and identifing the links included. This is why web browsers still have a "gopher://..." link. The gopher server didn't care what kind of browser you were using. So the first web server was likely a gopher server that provided files in HTML format.

Oh, one other thing. Gopher used port 80 for the server... and all web browsers since (and web servers) assume port 80 for the default URL "http://..".

I used this search for an early information search for a web browser based database of user questions (early FAQ) support for a DoD MSRC helpdesk. (This was before forms were available in web standards. The only input field available was a "non-standard" WAIS query field, which was a single input text line in a web page.)

[ Reply to This | # ]

  • netlib - Authored by: Anonymous on Friday, May 26 2006 @ 02:11 PM EDT
The Digital Dictionary
Authored by: DaveJakeman on Friday, May 26 2006 @ 09:46 AM EDT
Digital Press (presumably now HP, if it still exists as such)
Digital order number: EY-3433E-DP
ISBN: 0-932376-82-7
659 pages.

From the back page:

"Based on the work of DEC engineers, documentation writers and educational
specialists, the Dictionary guides users of Digital Equipment Corporation's
products through the maze of technical terms, mnemonics and acronyms used to
identify or describe them. It is an indispensable sourcebook for computer
specialists, technical writers, course developers, instructors, students and
translators."

Contains a mixture of generic and DEC-specific computing terms.

---
Champagne for my real friends, real pain for my sham friends - Francis Bacon
---
Should one hear an accusation, try it out on the accuser.

[ Reply to This | # ]

How does the USPTO work without any references?
Authored by: talexb on Friday, May 26 2006 @ 10:04 AM EDT
It puzzles and astounds me as to how the US Patent Office can make what they
consider to be good decisions on awarding any kind of patent without a fair bit
of relevant technical know-how. To me, it's almost an admission that they were
guessing, and suggests that many, if not all, software patents may need to be
re-examined.

It reminds me of Alice Through the Looking Glass. How bizarre.

[ Reply to This | # ]

NIST has DADS
Authored by: epostma on Friday, May 26 2006 @ 10:15 AM EDT

I've recently discovered the Dictionary of Algorithms and Datastructures by NIST. It's quite good, but it only deals with the theoretical side of computer science.

Erik.

[ Reply to This | # ]

Numerical Recipes in *
Authored by: DL on Friday, May 26 2006 @ 11:15 AM EDT
Where * is one of several programming languages.

These are cookbooks containing hundreds of useful algorithms--all that math
you've forgotten since school--that you can put in your code. Some of these
books have over 1,000 pages, so they cover a lot of material.

The first edition of Numerical Recipes in C came out in 1988. That predates
most, if not all, software patents. Regardless, only patents issued less than 2
years prior to its publication have not expired.

You'll want the newer editions for real work, but that first edition will be the
most effective for debunking prior art.

---
DL

[ Reply to This | # ]

More scientific algorithm categories
Authored by: jturner on Friday, May 26 2006 @ 12:52 PM EDT

The proprietary NAG library is organized like this. The "complete functional summary" PDF here relates to another widely-used (but far from free) data analysis system. Netlib may be the biggest and oldest repository for free algorithms.

In terms of disciplines, scipy categorizes free Python packages like this.

I'll see if I can think of more. It is much easier to list (free or otherwise) packages/algorithms than to categorize them...

[ Reply to This | # ]

Jargon File
Authored by: Anonymous on Friday, May 26 2006 @ 02:03 PM EDT
I doubt much of it is useful, but I suppose The Jargon File fits within the rubric of the broad description...

[ Reply to This | # ]

Netlib->Free, Matlab commercial
Authored by: Anonymous on Friday, May 26 2006 @ 05:56 PM EDT
Interestingly many of the core free math routines from Netlib are used in the
very successful commercial product matlab. A large part of the documentation for
the bigger matlab routines refers directly to netlib functions and white
papers.

[ Reply to This | # ]

What is a Thesaurus?
Authored by: Ted Powell on Friday, May 26 2006 @ 09:41 PM EDT
The following quote is from WWW -- Wealth, Weariness or Waste : Controlled vocabulary and thesauri in support of online information access by Professor David Batty:
A thesaurus, to a layman, is a fat book prepared by somebody called Peter Mark Roget and used by college students to enlarge their vocabulary when writing term papers -- and, often and unfortunately, to vary the representation of the same concept from sentence to sentence. A thesaurus to an information scientist is a controlled set of the terms used to index information in a database, and therefore also to search for information in that database so the same concepts are represented by the same term. For many years in this country, thesauri were often presented as alphabetized lists of key terms, taken from the document to be indexed with references to and from other terms made as necessary. This traditional practice has changed in recent years to a more structured approach based on an analytical technique. Ironically, this means that the original misuse of the word "thesaurus" by information scientists, to describe purely alphabetical lists of terms (Roget organized his thesaurus by categories of knowledge, and included an alphabetized list of terms only as an index), has been amended so that it is now closer to a proper use of Roget's meaning to include both categorization and alphabetical listing.

The heirarchical structure is especially important when looking for prior art overlapping a patent claim, given that the claimant has a certain incentive to avoid the use of common nomenclature. The UNESCO Thesaurus: hierarchical list is a nice example of a heirarchy that covers a lot of ground.

The site's home page has explanations of common structural terms, such as BT, RT, NT: Broader Term, Related Term, Narrower Term, respectively.

---
"If you don't have the source code, you are probably going to
be screwed in the long run." --Philip Greenspun

[ Reply to This | # ]

IEEE Standard Software Engineering Terminology
Authored by: Anonymous on Monday, May 29 2006 @ 06:46 PM EDT

It seems that the IEEE standard might be a good one to use, since it was created by the software profession, it's been around a long time, and it was also recently updated.

IEEE Standard Glossary of Software Engineering Terminology (rev 2002)
http://standards.ieee.org/reading/ieee/std/se/610.12-1990.pdf

Seems to cost money though.

J

[ Reply to This | # ]

What about the Linux Dictionary at the Linux Documentation Project?
Authored by: Anonymous on Wednesday, May 31 2006 @ 11:17 PM EDT
You can find it here.

[ Reply to This | # ]

Groklaw © Copyright 2003-2013 Pamela Jones.
All trademarks and copyrights on this page are owned by their respective owners.
Comments are owned by the individual posters.

PJ's articles are licensed under a Creative Commons License. ( Details )