|
Mike Anderer, "MIT" Scientists, Spectral Analysis, and a Patent Application |
|
Friday, September 30 2005 @ 03:50 AM EDT
|
Stats_for_all has found something truly fascinating. There is a published patent application, #20050216898, filed September 13, 2004 and just published September 29, 2005, for a "System for software code comparison." One of the inventors is a Michael Anderer of Salt Lake City, Utah. That wouldn't be Darl McBride's old pal, Mike Anderer, now, would it?
You remember him, don't you? Remember his leaked memo with all the misspelled words back in March of 2004 that revealed that BayStar was a Microsoft referral and that Microsoft sent $86 million SCO's way, "including BayStar", thanks to Anderer? And then a few days later, he wrote a letter which he published on NewsForge about Microsoft's plan to destroy Linux, and he described himself like this: I will file close to 20 patents this year for companies in many spaces, including homeland security, anti-terrorism, several grid computing and virtual machine patents, and, ironically, I should have one issued in the expiring and disappearing e-mail arena. It was initiated 4-5 yrs ago. Did he neglect to tell us about a patent for a system for software code comparison? When you read the patent description, you'll even find "spectral analysis" mentioned. Have we fingered the "MIT" deep divers at last? Anderer also revealed what he had thought was going to happen in the SCO litigation: "I think one real issue, that people are skirting, is who will be the ultimate guarantor of IP-related issues in a world that is governed by the GPL and GPL-like licenses. I could easily see IBM, HP, Sun, and many of the other large hardware players solving this problem tomorrow by settling the dispute with SCO and maybe even taking the entire code base and donating it into the public domain. I know this is what I originally thought would happen, at least the settlement part." So sad, when dreams of wealth die, huh? Why would Sun, for one, pay anything for code it has already open sourced, very carefully in a way to keep the GPL out, and then put that code in the public domain? Why would IBM reward SCO's vicious behavior toward them? It probably sounded like a great strategy back in the planning stage to Darl and Mike, but look at it now, how it has played out.
Groklaw already covered the early days and the friendship between Anderer and McBride going back to the '90's, and their business adventures together before SCO began suing the world, and even then, we said that Anderer seemed more involved in the SCO strategy than at first appeared. Now, comes this patent application, which indicates even deeper involvement, if our guess that it's the same man proves to be accurate. You might recognize some of the other inventors too, from Pointserve: Edward Powell and Mark Lane and also Ed White, who was "formerly co-founder and Vice-President of Engineering of
PointServe". When IBM sent PointServe a subpoena, we wrote this: You remember PointServe. Darl McBride was CEO at PointServe prior to joining FranklinCovey in 2000. And guess who is on the board at PointServe? McBride's old friend, Mike Anderer, the old pal who helped McBride come up with SCO's IP litigation strategy and approached Microsoft on SCO's behalf, according to Newsweek's Brad Stone, which led to the Baystar hookup.
Frank Sorenson noticed something else that is positively riveting. Check out PointServe's management page:
G. Edward Powell, Chairman and Chief Executive Officer - ... For eight years prior to founding PointServe, Dr. Powell was a Member of the Technical Staff at the MIT’s Lincoln Laboratory
Mark T. Lane, Chief Scientist - ... Dr. Lane spent 10 years as a member of the Technical Staff at MIT Lincoln Laboratory
So, what do you think? Has IBM found the "MIT rocket scientists" Darl bragged about and then backed away from and that we imagined had somehow disappeared into the mist or into the Bermuda Triangle, thus becoming, alas, unavailable to testify for SCO as experts at trial about their purported comparisons of UNIX/Linux code? Looks to me like we might have found not only the "MIT" rocket scientists SCO first bragged about and then clammed up about, but perhaps their method as well. Note some of what the patent says the invention can do: [0034] For example, the present invention allows an owner of proprietary code to submit their code to a website which compares the code against a database of open source code bases. The database of open source code bases may be an open source UNIX or Linux distribution, for example. In this capacity, the system of the present invention would be used for auditing proprietary code to determine if it contained open source software, or if a particular open source software release contained proprietary software. This audit could be scheduled to run on a periodic basis automatically.
[0035] In another aspect, the database of open source code bases may contain a number popular open source applications of a certain type, such as image manipulation or audio processing applications that may be protected by trade secrets or patents. In this aspect, the input file may be patent claims or design specifications containing concepts that are compared against the concepts in the source code in the database. Thus, while the structure of the two corpuses is different (patent claims on the one hand and source code on the other) it is still possible according to the system of the present invention to determine whether they share concepts in common. . . .
[0038] For example, in another exemplary embodiment, the system of the present invention may take search phrases for comparison against a target corpus of patents, or patent claims for use in a patent search. As discussed earlier, the system parses the search terms for concepts based on natural language processing methods, and assigns raw power values based on the frequency of the concept in the target corpus. In a further aspect, the system may analyze each file in the target corpus (each patent in a patent database, for example), and replace each instance of each concept in the specifications with that concept's respective power. In addition, or in the alternative, the concept may be replaced by its power in the claims, in the case of an infringement analysis. . . .
[0043] More particularly, the process of profiling the corpus involves a multi-source characterization of that corpus along with a one-way transform assigned to preserve the confidentiality, secrecy and integrity of that original code document. Because the corpus may contain trade secrets or other proprietary intellectual property information, it may be necessary to use cryptographic methods to convert that corpus, which is readable by anyone, into a form that is only readable and useful by the system of the present invention and such that the conversion may not be reversed. This protects against the risk that the original corpus may be reverse engineered from the transformed corpus.. . . [0047] However, a content analysis as noted in quadrant 102 can also be done to determine whether the content of two corpuses is similar even though their structure may differ. The content analysis may use rare word searches to accomplish this function. In the embodiment discussed earlier with respect to source code files and computer programming, while computer programming languages have certain reserved words that are likely to be found in any source code file written in that language, it is not likely that variable names, function names, procedure names or comments will be shared across source code files unless they were written by the same person or unless one was written with the knowledge of the other. Thus, the variable name, function name, procedure name or comment could be the rare word that is searched for in both corpuses. If the rare word is found in both, then it is likely that portions of source code were copied but simply altered in their structural position in the document. For example, if one code file uses an "if-then" statement and another corpus uses a "case" statement, but the variables are the same in the two code files, then the resemblance will be detected by the content analysis using rare word searches. This may reveal that the second corpus code file was written with the in the presence of or with the knowledge of first corpus, that the second corpus was written by someone who also wrote the first corpus, or that the second corpus is simply a rewrite of the first corpus.
[0048] Furthermore, and as illustrated in FIG. 1 at quadrant 103, while two corpuses may have the same structure, they may have different content. In this case, the system of the present invention may perform a spectral or histogram analysis to determine whether certain concepts are found in both documents despite being identified by different terms in the source code file. Thus, in the case of source code, structure could be an "if-then" statement used in both code files. However, if the two code files different variable names within this same structure, the resemblance will not be detected either by a strict textual analysis or content analysis using rare word searches. However, the spectral analysis 103 will detect the presence of similar structure where the rare words, in this case the variable names, are different.
Ah! SCO's favorite words: "concepts", as in methods and concepts. Get it? And "spectral analysis". I think X marks the spot, all right. And so handy for finding patent infringement just when Microsoft is allegedly wanting to find some. Is that a dovetail, or what? Well, gang, you probably don't need me to tell you what I think the scheme was and is. But don't forget one salient truth: the examples SCO already offered, supposedly found by three teams of deep divers, including the "MIT" scientists, or... um, not *exactly* MIT scientists, or the used-to-be MIT-linked group, depending on which day SCO was blabbing, were shot down in a day: In a slide show last Monday, SCO showed six examples of Linux files it says were illegally copied from its confidential Unix code.
Linux partisans who obtained a copy of the slide show were quick to trace the examples back to their origins, which appear preliminarily in each case not to belong to SCO.
The company disputes this analysis. “We’re the owners of the Unix (AT&T) System V code, and so we would know what it would look like,” the company told McMillan and the IDG News Service. “Until it comes to court, it’s going to be our word against theirs.” Not exactly. Not only did the Linux community reveal the code wasn't useful for SCO's purposes, Judge Kimball also found nothing but a surprising lack of evidence. This patent may describe a system for code comparison, but it failed at the most important part, from SCO's standpoint: establishing who actually owns the code. Prior art, anyone? Here's the patent claims and description, for those who would like to read the entire thing:
*****************************
United States Patent Application |
20050216898
|
Kind Code
| A1
|
Powell, G. Edward JR.
; et al.
|
September 29, 2005
|
System for software source code comparison
Abstract
A system for analyzing similarities between a first and second corpus or
between a set of concepts and a corpus uses natural language processing
and machine intelligence methods to replace terms or phrases in the
corpus with concepts, determine the frequency of each concept in the
corpus, and convert the corpus into a concept frequency file to enable
easy comparison of the two corpuses or easy retrieval of items from the
corpus that contain concept. Difference analysis and a combination of
content and spectral analysis may be employed.
Inventors: |
Powell, G. Edward JR.; (Brentwood, TN)
; Anderer, Michael; (Salt Lake City, UT)
; Lane, Mark T.; (Franklin, TN)
; White, N. Edward; (Austin, TX)
|
Correspondence Name and Address:
|
MCKENNA LONG & ALDRIDGE LLP
1900 K STREET, NW
WASHINGTON
DC
20006
US
|
Serial No.:
| 938844 |
Series Code:
| 10
|
Filed:
| September 13, 2004 |
U.S. Current Class: |
717/141; 717/114 | U.S. Class at Publication: |
717/141; 717/114 |
Intern'l Class: |
G06F 009/45 |
Claims
What is claimed is:
1. A system for comparing at least a first corpus to a second corpus,
comprising: a profiler that characterizes each of said first corpus and
second corpus; an encryption engine respectively encrypting the first
corpus and the second corpus using a one-way transform; an analyzer
identifying concepts in the transformed corpuses, said analyzer
determining a frequency rating of said concepts in each corpus; for each
corpus, replacing each instance of each of said concepts on every line
with its respective frequency rating to create a frequency file; and a
comparator correlating the frequency file for the first corpus to the
frequency file for the second corpus.
2. The system of claim 1, wherein said profiler, encryption engine,
analyzer, and comparator are computer programs running on at least one
general purpose computer.
3. A system for searching a corpus of data objects, comprising: receiving
a list of concepts; relating at least one of said concepts to at least
one search term; searching each of said data objects for each of said
terms; and determining the correlation of at least one concept and at
least a second concept in said corpus of data objects based on the
presence of search terms relating to said first and search terms relating
to said second concept in the same data object.
4. The system of claim 3, wherein said corpus of data objects is a
database of documents.
5. The system of claim 4, wherein said receiving a list of concepts
comprises a computer program receiving a list of concepts posted from an
internet web page.
6. The system of claim 3, said determining the correlation further
comprises separately determining the correlation of each concept with
each other concept.
7. The system of claim 3, said determining the correlation further
comprises determining the correlation of each concept with three or more
other concepts.
Description
[0001] This application claims the benefit of U.S. Provisional Patent
Application No. 60/502,098, filed on Sep. 11, 2003, which is hereby
incorporated by reference for all purposes as if fully set forth herein.
BACKGROUND OF THE INVENTION
[0002] 1. Field of the Invention
[0003] The present invention relates to data object comparison and
analysis, and in particular to software for comparing two or more data
objects to determine the extent of any similarities between them.
[0004] 2. Discussion of the Related Art
[0005] Companies increasingly rely on software to provide not only
products for consumers or their institutions, but also to manage their
day-to-day operations. Software code has therefore become a valuable
intellectual property (IP) asset.
[0006] The ever-increasing complexity of computer software programs as
well as tight development schedules force programmers to become more
efficient. One way for programmers to meet these challenges is by reusing
source code and adapting it to new applications rather than writing the
source code from scratch.
[0007] To this end, open source software has become increasingly popular.
Open source software is software source code that is publicly available
and freely downloadable from the Internet. Thus, open source software
code is a convenient resource for programmers looking to cut development
time by downloading it and merging it with their proprietary application.
In addition, the growth of the open source software movement may also
motivate computer programmers to donate or contribute software to the
open source movement that they have written but that is owned by their
employer. The problem is that most open source software, while freely
available for downloading is not in the public domain.
[0008] In particular, open source software is not unrestricted--to the
contrary it is often subject to licenses that restrict not only the open
source software code itself but any modification thereof and any software
that incorporates it as well. Typically, these open source licenses may
require that the source code of any proprietary system using some open
source software code be publicly disclosed. In other words, a programmer
who uses open source code in a proprietary application may
unintentionally subject that proprietary application to the constraints
and restrictions of an open source license. This may have devastating
affects on the ability of the company to protect software IP or pursue
further intellectual property protection for their software.
[0009] In addition, open source software has another inherent risk--it is
unknown to what extent open source software incorporates proprietary
technology owned by others. Thus, even if open source software is free of
any licensing restriction, such as open source software that is in fact
committed to the public domain, the possibility remains that the software
may infringe another's patents or property rights. A programmer who
incorporates this open source code into their proprietary application may
unintentionally subject his employer to unforeseen consequences such as
infringement litigation.
[0010] Furthermore, the rapid growth of the software industry has driven
many programmers and software engineers to change employers regularly and
often. There is a problem that as these workers move between jobs, they
may be taking proprietary source code that they wrote for a previous
employer with them to their new employment. Programmers may not be aware
or may not be sensitive to these concerns, and risk an inadvertent
technology transfer or intellectual property transfer.
[0011] In addition, as companies increasingly rely on overseas or offshore
development firms for software programming, there is a concern that the
overseas development company may be reusing source code that it wrote for
one client (who has rights to that software) for projects it works on
with other clients.
[0012] The problem is not limited to computer source code. In addition to
source code, design documents and technical specifications may be
indicative of patent infringement or may be used to invalidate patents.
But due to the relative ambiguity of terms of art in the software and
business methods fields as well as the non-technical nature of language
that is often used in patents, it is very difficult to assess IP risks
properly and efficiently.
[0013] These IP risks are more serious given the tight regulatory
environment in which companies operate. Corporate regulations, such as
those collectively known as "Sarbanes-Oxley", require that firms monitor
their intellectual property assets as well as the financial risks to
their business perform regular IP and risk audits, and report the same to
their shareholders, regulators, and the public.
[0014] But given that programmers often modify source code slightly when
reusing it, it becomes difficult to perform IP software risk audits using
redline or other character-based comparison methods. Thus, what is needed
in the art is a multi-dimensional approach to comparing two or more
corpuses, such as source code, documents, file objects, collections of
data or file objects, or databases, that is able to determine the extent
to which one corpus resembles another even when the particular structure
or content of the two corpuses vary.
SUMMARY OF THE INVENTION
[0015] Accordingly, the present invention is directed to a system for
software source code comparison that substantially obviates one or more
of the problems due to limitations and disadvantages of the related art.
[0016] An advantage of the present invention is to provide a system for
comparing two corpuses to determine how they resemble one another.
[0017] Another advantage of the present invention is to provide a system,
software, and methods for analyzing at least two corpuses and determining
concepts contained in each and further determining the extent to which
the corpuses contain concepts in common.
[0018] Additional features and advantages of the invention will be set
forth in the description which follows, and in part will be apparent from
the description, or may be learned by practice of the invention. The
objectives and other advantages of the invention will be realized and
attained by the structure particularly pointed out in the written
description and claims hereof as well as the appended drawings.
[0019] To achieve these and other advantages and in accordance with the
purpose of the present invention, as embodied and broadly described, a
system for comparing at least a first corpus to a second corpus includes
a profiler that characterizes each of said first corpus and second
corpus; an encryption engine respectively encrypting the first corpus and
the second corpus using a one-way transform; an analyzer identifying
concepts in the transformed corpuses, said analyzer determining a
frequency rating of said concepts in each corpus, replacing each instance
of each of said concepts on every line with its respective frequency
rating to create a frequency file; and a comparator comparing the
frequency file for the first corpus to the frequency file for the second
corpus.
[0020] In another aspect of the present invention, a system for searching
a corpus of data objects includes: receiving a list of concepts; relating
at least one of said concepts to at least one search term; searching each
of said data objects for each of said terms; and determining the
correlation of at least one concept and at least a second concept in said
corpus of data objects based on the presence of search terms relating to
said first and search terms relating to said second concept in the same
data object.
[0021] It is to be understood that both the foregoing general description
and the following detailed description are exemplary and explanatory and
are intended to provide further explanation of the invention as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
[0022] The accompanying drawings, which are included to provide a further
understanding of the invention and are incorporated in and constitute a
part of this specification, illustrate embodiments of the invention and
together with the description serve to explain the principles of the
invention.
[0023] In the drawings:
[0024] FIG. 1 is a diagram illustrating an aspect of a first exemplary
embodiment of the present invention.
[0025] FIG. 2A is a process diagram illustrating the system of the present
invention according to a first exemplary embodiment.
[0026] FIG. 2B is a process diagram illustrating profiling according to a
first exemplary embodiment of the present invention.
[0027] FIG. 3A illustrates sample histograms according to an aspect of a
first exemplary embodiment of the present invention.
[0028] FIG. 3B illustrates sample spectral extracts according to an aspect
of a first exemplary embodiment of the present invention.
[0029] FIG. 4 illustrates a sample correlation matrix according to an
aspect of a first exemplary embodiment of the present invention.
[0030] FIG. 5 illustrates a further embodiment of the present invention.
DETAILED DESCRIPTION OF THE ILLUSTRATED EMBODIMENTS
[0031] Reference will now be made in detail to embodiments of the present
invention, examples of which are illustrated in the accompanying
drawings.
[0032] The system of the present invention models the conditional
probability that two (or more) corpuses have a similar combination of
characteristics. For example, the two corpuses may be software source
code bases composed of source code files, structured or unstructured
documents, patents, or technical disclosures. The characteristics
analyzed may be the structure and content of those code bases and source
code files, for example.
[0033] The system of the present invention analyzes and compares the
corpuses in such a way that they may be preprocessed without affecting
the comparison. In one exemplary embodiment, the corpus is transformed
using any one of a number of one-way transforms understood to those of
ordinary skill in the art, allowing the system of the present invention
to protect the proprietary, secure, confidential, or privileged nature of
the corpus and still allow it to be compared against another corpus. In
the alternative, proprietary one-way encryption transforms may be used.
[0034] For example, the present invention allows an owner of proprietary
code to submit their code to a website which compares the code against a
database of open source code bases. The database of open source code
bases may be an open source UNIX or Linux distribution, for example. In
this capacity, the system of the present invention would be used for
auditing proprietary code to determine if it contained open source
software, or if a particular open source software release contained
proprietary software. This audit could be scheduled to run on a periodic
basis automatically.
[0035] In another aspect, the database of open source code bases may
contain a number popular open source applications of a certain type, such
as image manipulation or audio processing applications that may be
protected by trade secrets or patents. In this aspect, the input file may
be patent claims or design specifications containing concepts that are
compared against the concepts in the source code in the database. Thus,
while the structure of the two corpuses is different (patent claims on
the one hand and source code on the other) it is still possible according
to the system of the present invention to determine whether they share
concepts in common.
[0036] In these aspects of the invention, the need to keep the proprietary
corpus confidential is paramount. Thus, providing a one-way transform of
the proprietary corpus, using some form of or combination of natural
language processing, machine learning, and data encryption, minimizes the
risk of inadvertent disclosure of proprietary information. It is
necessary that the transform be one way (i.e. irreversible) to protect
the confidentiality of the corpus against the risk that the system on
which the comparison is run is compromised in some way, or that the
corpus is intercept en route.
[0037] As noted earlier, the system is not limited to comparing source
code. The system may be adapted to compare compiled object code as well,
which is important in case of reverse engineering or infringement of
copyrights to software. Furthermore, the system may be adapted to
corpuses other than computer code.
[0038] For example, in another exemplary embodiment, the system of the
present invention may take search phrases for comparison against a target
corpus of patents, or patent claims for use in a patent search. As
discussed earlier, the system parses the search terms for concepts based
on natural language processing methods, and assigns raw power values
based on the frequency of the concept in the target corpus. In a further
aspect, the system may analyze each file in the target corpus (each
patent in a patent database, for example), and replace each instance of
each concept in the specifications with that concept's respective power.
In addition, or in the alternative, the concept may be replaced by its
power in the claims, in the case of an infringement analysis.
[0039] In a first exemplary embodiment the present invention allows one
corpus to be compared against at least one other corpus. As noted, a
corpus may be any data object, file object, collection of data or file
objects or any type of structured or unstructured data or documents. This
includes source code files including both instructions and comments,
object code, text documents, structure documents such as spreadsheets,
word processing files, HTML or XML documents, or databases or collections
thereof.
[0040] In a first aspect of this invention, a first corpus (the source
corpus) is profiled and converted into a metadata file. Likewise, the
second, or target, corpus is profiled and converted into a target
metadata file. In this particular aspect, the profiling process includes
encrypting or otherwise transforming the corpus using a one-way
transform, and then characterizing the transformed corpus before
converting it into a metadata file.
[0041] FIG. 2A is a block diagram generally illustrating a system for
comparing two corpuses according to the present invention. Proprietary
intellectual property is taken at step 20 as input and transformed at
step 22 using natural language processing, machine intelligence, and
encryption. At step 24, the transformed proprietary property is
characterized as discussed herein and compared at step 26 with one or
more other characterized corpuses in the characterization database 28.
The profiling tool may perform multi-source characterizing and a one-way
transform. By making the transform a one-way transform the system will
protect the proprietary nature of the source code. If the source code
could be reverse engineered from the metadata file, very few companies
with proprietary source code would want to use the system for fear of
disclosing their source code to others. By making it a one-way transform,
they may be comfortable that their confidential information and source
code will be kept confidential.
[0042] Software source code B is taken as input at step 21 by a profiling
block 23 which performs profiling on the source code to produce a
metadata file B at step 25. The metadata files are then compared to one
another at step 26 and a report is generated at step 27. The report will
reflect how closely the two pieces of software resembled one another.
[0043] More particularly, the process of profiling the corpus involves a
multi-source characterization of that corpus along with a one-way
transform assigned to preserve the confidentiality, secrecy and integrity
of that original code document. Because the corpus may contain trade
secrets or other proprietary intellectual property information, it may be
necessary to use cryptographic methods to convert that corpus, which is
readable by anyone, into a form that is only readable and useful by the
system of the present invention and such that the conversion may not be
reversed. This protects against the risk that the original corpus may be
reverse engineered from the transformed corpus.
[0044] After the two corpuses are profiled and converted into respective
source and target metadata files, then the two metadata files are
compared to determine how closely they resemble each other. The details
of the multi-source characterization and the comparison will be discussed
below.
[0045] In further aspect of the first exemplary embodiment, the corpus is
characterized by a structure and content. In other words, any data object
or file object will contain some inherent structure that organizes the
content stored within it. Thus, it is possible for two corpuses such as
source code files, for example, to have both similar structure and
content, different structure and content, similar structure with
different content, or different structure with similar content. A
two-by-two matrix showing the possible scenarios is illustrated in FIG.
1.
[0046] FIG. 1 illustrates the possible relationships between two corpuses.
The system of the present invention can perform a number of different
analyses on two corpuses to determine whether they resemble one another.
For example, to determine whether the two corpuses share content and
structure as noted in quadrant 101, ordinary text comparison programs
such as redline applications or text comparison commands, such as the
grep, diff or comm commands found in the UNIX operating system, may be
used. This will reveal whether or not sections of the corpus are
identical in structure and content.
[0047] However, a content analysis as noted in quadrant 102 can also be
done to determine whether the content of two corpuses is similar even
though their structure may differ. The content analysis may use rare word
searches to accomplish this function. In the embodiment discussed earlier
with respect to source code files and computer programming, while
computer programming languages have certain reserved words that are
likely to be found in any source code file written in that language, it
is not likely that variable names, function names, procedure names or
comments will be shared across source code files unless they were written
by the same person or unless one was written with the knowledge of the
other. Thus, the variable name, function name, procedure name or comment
could be the rare word that is searched for in both corpuses. If the rare
word is found in both, then it is likely that portions of source code
were copied but simply altered in their structural position in the
document. For example, if one code file uses an "if-then" statement and
another corpus uses a "case" statement, but the variables are the same in
the two code files, then the resemblance will be detected by the content
analysis using rare word searches. This may reveal that the second corpus
code file was written with the in the presence of or with the knowledge
of first corpus, that the second corpus was written by someone who also
wrote the first corpus, or that the second corpus is simply a rewrite of
the first corpus.
[0048] Furthermore, and as illustrated in FIG. 1 at quadrant 103, while
two corpuses may have the same structure, they may have different
content. In this case, the system of the present invention may perform a
spectral or histogram analysis to determine whether certain concepts are
found in both documents despite being identified by different terms in
the source code file. Thus, in the case of source code, structure could
be an "if-then" statement used in both code files. However, if the two
code files different variable names within this same structure, the
resemblance will not be detected either by a strict textual analysis or
content analysis using rare word searches. However, the spectral analysis
103 will detect the presence of similar structure where the rare words,
in this case the variable names, are different.
[0049] Finally, there may be instances that fall into the fourth quadrant
104 of FIG. 1, where both the structure of the document and the content
of the document are different. This is where it is necessary to provide
human IP or intellectual property thread analysis. In other words, human
readable documents such as manuals, read-me files, message board
postings, news group postings, chat transcripts, resumes, press releases,
journal articles, and marketing materials or the like are reviewed to
determine whether people involved in creating the first corpus were at a
different time with working with the company that wrote the second
corpus. In the alternative, such documentary analysis may reveal that
authors of the first and second corpus knew each other, were familiar
with one another, or those working or somehow came in contact with each
other.
[0050] The spectral analysis will now be discussed in detail. In an aspect
of the first exemplary embodiment, the corpuses being compared are source
code. To characterize source code according to the present invention,
each file in the code base is processed as illustrated in FIG. 2B. The
processing involves stripping away any comments, white spaces or
programming language-specific characters, such as the asterisk, the
ampersand, semicolon, comma, for example, in step 202. It is understood
by one of ordinary skill in the art that a different type of corpus such
as a text document, XML document or HTML document will have different
characters that are specific delimiters in that type of corpus.
[0051] After this information has been removed, at step 204, concept
information is gathered from the source code files in the code base
corpus. Concept information is gathered by first producing a raw concept
file at step 206 which retains the line structure and that records the
concepts in those lines in a dictionary file. Next, the raw power of each
concept is determined at step 208. The raw power is the number of times
that the concept is used in the entire code base.
[0052] After the raw power of each concept in the code base is determined,
a raw concept frequency file for each source file in the code base is
produced at step 210. This raw concept frequency file records the
concepts on each line of the file by replacing the concepts on the line
with their respective raw power values. After step 210, the system of the
present invention according to this particular exemplary embodiment
assigns a frequency or power number to every term used in the code file
at step 212.
[0053] Thus, for each line in the file each concept is translated into the
power of that concept from the corpus dictionary that was created
earlier. For example, a line containing a number of different concepts
would be replaced by a sequance such as 2363:12:300:41, for example, in
which the numbers are the power numbers of the concepts and the colons
are delimiters used to separate different concepts on the same line.
[0054] After this stage, spectral summary charts may be created as
illustrated in FIG. 3A. The spectral summary chart reports on the
similarities between the two code files A and B by providing graphs 301
and 302 of the histograms or spectrum of each of code files A and B,
respectively, based the name of the file, the number of files and number
of lines in the file, the number of distinct concepts used in the file
and the total power of the lines in the file. This can then be plotted
and displayed in an ordinary bar chart format as illustrated in FIG. 3 in
which the horizontal axis is the line number of the file and the vertical
axis is the total aggregate power of that line from the concept
dictionary. By looking at the or spectral charts of the two files being
compared, one can see immediately whether or not the files contain
similar concepts because each line in each file will be replaced by a bar
on which the concept values in that line are plotted. The similarities
between the two files become obvious.
[0055] Furthermore, as illustrated in FIG. 3B, a spectral extract can be
obtained in which portions of a histogram from one file can be compared
against the histogram of the other file to see if there are sections of
the histograms that match exactly. This can be used to determine whether
or not entire sections of source code were duplicated in concept if not
in precise exact character matching. In other words, because source code
that accomplishes the exact same thing can be written in different ways,
it is necessary to determine to what extent the source code is written
using the same variable names, the same functions and the same order or
using the same programming styles which under ordinary circumstances
would differ significantly from one programmer to another. Thus, if
sections of the code display similar identical concepts, it is very
likely that source code has been duplicated and only modified slightly.
[0056] In a further embodiment of the present invention, the content
analysis and spectral analysis may be further extended to analyze patents
and patent claims for invalidity or infringement purposes. In other
words, while an intellectual property document such as a patent or a
design specification may include terms used to convey a concept, it is
understood that there are other terms that may be used as synonyms for
that same concept. This is particularly the case in software and business
method patents where there are few industry standard terms of art, or in
which the terms of art have ambiguous meanings and are used loosely by
those in the art.
[0057] Thus, the system for the present invention may have at its disposal
a corpus dictionary that is either predefined for a specific field of
knowledge in which the corpus (the patent, in this example) resides or it
may have a dictionary that is constructed ad hoc as part of the
analytical process using the first and second corpus to produce the
corpus dictionary of key concepts.
[0058] In addition, the concepts may be used to determine the extent to
which concepts are highly correlated in a corpus. Consider an example in
which the correlation of a number of biomedical concepts in patents is
sought. In this example, ten concepts 400, "neuromodulation", "brain
imag*", "cord stimulat*", "nerve stimulat*", "vivo magnetic resonance",
"Interventional Magnetic Resonance or Interventional MR", "brain
stimulat*", "intralaminar nucle*", "sympathetic or parasympathetic", and
"corpus callosum", are entered into system of the present invention. (The
* denotes a wildcard operator). The system, using natural language
processing methods understood in the art, searches a set of patents or
all patents for instances of the concepts (using terms from the concept
dictionary synonymous with the concept). The system returns a grid such
as that illustrated in FIG. 4, with the concepts 401 listed vertically
along the left side and the correlated concepts 402 listed along the top.
The number of patents 403
found containing each concept is returned and
displayed along with the concept 401 at the left. Then the system
correlates each concept with each of the other concepts and displays as a
percentage 404 of the total patents 403 found containing the first
concept alone the number of patents containing both search terms
together. If implemented as a hypertext document or world-wide-web page,
the percentage 404 can be selected to reveal the list of the patents
having the respective concepts.
[0059] This embodiment is not limited to a two dimensional grid. In
alternative aspects of this embodiment, a multidimensional array
N1.times.N2.times. . . . .times.Ni returns the correlation of any of
concepts 1 though i with any number greater than or equal to two of the
other concepts 1 though i. Conceptually, a 10.times.10.times.10 cube
would store the correlations of three of the ten concepts listed above.
It will be understood to those of skill in the art at the time of the
invention that the system of the present invention may be implemented in
any number of ways.
[0060] For example and as illustrated in FIG. 5, the present invention may
be implemented as a site on the internet which aggregates publicly
available documents on the internet, such as source code or patents, onto
databases residing on its own system which are used for the comparison.
In another example, the present invention may periodically access open
source code bases or patent databases across the internet and compare
them against proprietary code that is stored on its databases and servers
to provide periodic IP monitoring and auditing.
[0061] It will be apparent to those skilled in the art that various
modifications and variation can be made in the present invention without
departing from the spirit or scope of the invention. Thus, it is intended
that the present invention cover the modifications and variations of this
invention provided they come within the scope of the appended claims and
their equivalents.
|
|
Authored by: TheBlueSkyRanger on Friday, September 30 2005 @ 04:16 AM EDT |
Please remember to keep links clickable.
Dobre utka,
The Blue Sky Ranger
[ Reply to This | # ]
|
- OT Here, Please - Authored by: jmc on Friday, September 30 2005 @ 04:38 AM EDT
- South Africa: M$ XML software patent opposition headed for court? - Authored by: Anonymous on Friday, September 30 2005 @ 06:14 AM EDT
- SCO Contracts - Authored by: ff5166 on Friday, September 30 2005 @ 06:31 AM EDT
- OT Here, Please - Authored by: Jimbob0i0 on Friday, September 30 2005 @ 09:39 AM EDT
- Bad for Massachusetts? - Authored by: Anonymous on Friday, September 30 2005 @ 10:42 AM EDT
- OT Here, Please - Authored by: Anonymous on Friday, September 30 2005 @ 11:24 AM EDT
- M$ Glossary - Authored by: Anonymous on Friday, September 30 2005 @ 01:19 PM EDT
- OT Here, Please - Authored by: J.F. on Friday, September 30 2005 @ 01:48 PM EDT
- Comickey goodness - Authored by: Anonymous on Friday, September 30 2005 @ 10:36 AM EDT
- Patent Fud anyone - Authored by: emacsuser on Friday, September 30 2005 @ 11:45 AM EDT
- Not me! I'm clean! - Authored by: Ted Powell on Friday, September 30 2005 @ 01:02 PM EDT
- Chris Sontag's emWare is MIA - Authored by: qu1j0t3 on Monday, October 03 2005 @ 05:22 AM EDT
|
Authored by: TheBlueSkyRanger on Friday, September 30 2005 @ 04:17 AM EDT |
If needed.
Dobre utka,
The Blue Sky Ranger[ Reply to This | # ]
|
|
Authored by: Anonymous on Friday, September 30 2005 @ 04:54 AM EDT |
Does this kind of sweeping generalisation...
[0007] ...open
source software movement may also motivate
computer programmers to donate or
contribute software to the open source
movement that they have written but that
is owned by their employer. The
problem is that most open source software,
while freely available for
downloading is not in the public
domain.
... really tick you off?
If a statement that is
not clearly supported by fact, reference or example
is included in a patent,
does it go some way to diluting the potential for the
patent to be
accepted?
It clearly displays the alleged inventors' bias against the
open source
movement
It also shows to me what a farce the software
patent thing is - no doubt
the system to do this hasn't beeen coded and proved
to actually function -
and IBM have probably done better with the own code
comparison experts -
so correct me if i'm wrong, but this is nothing more than
a patent application
for vapourware?
Charles from Oz [ Reply to This | # ]
|
- loony assertion no. 19456: most open source code stolen from employers? - Authored by: Anonymous on Friday, September 30 2005 @ 05:38 AM EDT
- loony assertion no. 19456: most open source code stolen from employers? - Authored by: inode_buddha on Friday, September 30 2005 @ 05:54 AM EDT
- loony assertion no. 19456: most open source code stolen from employers? - Authored by: Anonymous on Friday, September 30 2005 @ 06:24 AM EDT
- Hmm, but it's true. - Authored by: Dark on Friday, September 30 2005 @ 07:02 AM EDT
- No it's not - Authored by: Anonymous on Friday, September 30 2005 @ 08:56 AM EDT
- loony assertion no. 19456: most open source code stolen from employers? - Authored by: eskild on Friday, September 30 2005 @ 10:26 AM EDT
- loony assertion no. 19456: most open source code stolen from employers? - Authored by: frk3 on Friday, September 30 2005 @ 10:27 AM EDT
- Rebuttal done already - Authored by: Anonymous on Friday, September 30 2005 @ 10:52 AM EDT
- FUD is everywhere - Authored by: Anonymous on Friday, September 30 2005 @ 12:38 PM EDT
- Correction: no code is ever stolen - Authored by: Anonymous on Friday, September 30 2005 @ 01:33 PM EDT
- loony assertion no. 19456: - Authored by: Anonymous on Friday, September 30 2005 @ 01:58 PM EDT
|
Authored by: Nonad on Friday, September 30 2005 @ 05:00 AM EDT |
Sorry, I couldn't resist popping this in here again...
Red Meat take
off...
[ Reply to This | # ]
|
|
Authored by: Totosplatz on Friday, September 30 2005 @ 05:03 AM EDT |
Blue-sky rumination Lemelson-style "invention" - toss in every kind of idea
plus the kitchen sink...
From now on any kind of system that compares
something to some other thing in a repository somewhere will "infringe" the
patent on this "invention." With this description in hand, can one of these
beasts actually be implemented? "Spectral" analysis on the 'concepts' present?
In what exact century? After encryption has rendered it all into mush? Give me a
break!
Toss out the USPTO! Clearly not up to the task. Either that or
this is some kind of grad-student prank!
--- All the best to one and
all. [ Reply to This | # ]
|
|
Authored by: Anonymous on Friday, September 30 2005 @ 05:15 AM EDT |
The process of profiling the corpus involves a multi-source
characterization of that corpus along with a one-way transform assigned to
preserve the confidentiality, secrecy and integrity of that original code
document.
For previous art on this, how about Eric Raymond's
comparator program, or my own
comparison tool. Both
of them perform a one-way transform on source code, and both were published well
before September 13, 2004.
Warren [ Reply to This | # ]
|
|
Authored by: Anonymous on Friday, September 30 2005 @ 05:24 AM EDT |
Could anybody comment on whether this patent has any merit?
From reading it I get the impression that it is complete hogwash, pseudo
scientific rambling. As such it reminds me of the Alan Sokal hoax (use google).
Several things I find rather suspect:
The text speaks of comparing UNIX source code to Open Source code. Funny that
the wording is tailored to newSCO's needs. Why, if one has a method for
comparing UNIX source code to Open Source code, could this method not be apllied
to any two pieces of source code? Why on earth would one not make the patent as
broad as possible?
The problem of deciding whether two pieces of code have the same semantics, ie,
perform the same task, is undecidable: no procedure can exist that within a
finite amount of time tells one whether or not the two pieces of code have the
same sematics. And this patent application pretends to be able to do even
better: whether one piece of code implementens a patent or not.
Spectral analasys an source code comparison don't go hand in hand. This patent
does not tell me how they could. For finding structural similarities one would
rather apply pretty straighforward technique: compile the code to a suitable
representation (a tree), optimize, and than compare.
I got the distinct feeling that this patent is applied just to unerpin the FUD
("We used patented technology to establish blah blah infringment blah blah
Linux blah blah patents blah blah <more McBride & Stowell speak
here>)
[ Reply to This | # ]
|
|
Authored by: caliboss on Friday, September 30 2005 @ 05:41 AM EDT |
Seriously...Mr. Stats_For_all is one fellow I would never want on the other side
of a dispute.
Is there some secret sauce here that you use or can you give
us all some insights, tips, tricks, cluesticks as to how you soooo effectively
eviscerate SCO before they even have a chance to exhale.
What's your
research setup look like? What data sources do you use and trust? I'm amazed.
--- Grok the Law / Rock the World [ Reply to This | # ]
|
|
Authored by: inode_buddha on Friday, September 30 2005 @ 05:42 AM EDT |
Memo to Mike Anderer: The "ultimate guarantor" already exists in the
form of 17 USC. It doesn't need any help from anyone.
---
-inode_buddha
Copyright info in bio
"When we speak of free software,
we are referring to freedom, not price"
-- Richard M. Stallman[ Reply to This | # ]
|
|
Authored by: eamacnaghten on Friday, September 30 2005 @ 05:56 AM EDT |
Prior art, anyone?
How similar is ESR's Comparator? Released
(therefore published) under GPL in 2003 I think...
Link.
Web Sig: Eddy Currents [ Reply to This | # ]
|
|
Authored by: thorpie on Friday, September 30 2005 @ 05:56 AM EDT |
Now if they are using open source source code in their program, even as an
input, do they have to comply with the GPL??
I'd really like to
know --- The memories of a man in his old age are the deeds of a man in
his prime - Floyd, Pink [ Reply to This | # ]
|
|
Authored by: Anonymous on Friday, September 30 2005 @ 06:29 AM EDT |
If I understand this correctly that this is about corpus analysis then we are in
the realms of "natural language processing" - which, among many other
things - is computational analysis of large textual works to glean meaning - new
words for dictionaries, eg., OED, are identified in this way.
US Groklawers could track outputs from this discipline via DARPA and a body
whose name escapes me (National Science Foundation? - that does all the science
funding)
EU Groklawers could track down work from the Commission's Telematics programme.
Universities and companies have done a lot of government funded work here.
UK Groklawers could check with the Research Councils or possibly the Department
of Trade and Industry - including asking about old annual reports of the JFIT
programme and or SALT.
For those of you in Universities - nip along to your (e.g., English) Language
departments, rather than Computational Sciences.
UK Groklawers in Universities could try English Language departments in
Imperial, Oxford, UCL, Birmingham, Lancaster, Essex, UCE, Edinburgh
HTH
[ Reply to This | # ]
|
|
Authored by: cmc on Friday, September 30 2005 @ 06:54 AM EDT |
I seem to recall at least two companies popping up recently that claim to do
code comparison (compare submitted code against databases of known code),
specifically looking for IP infringement. I just did some googling, and one of
those companies is Black Duck Software. Here's an
article about their service in eWeek. From that article:
"Black Duck
Software is now providing its source-code checking software as an Internet-based
service. The software, ProtexIP, is used to check for any potential violations
of open-source or other software licenses within source code. Now, customers can
subscribe to Black Duck's knowledge base through its ProtexIP/OnDemand service
rather than installing their own servers on-site."
This
Infoworld article (dated May 14, 2004) mentions the launch of the ProtexIP
software and what it does (code comparison). This is the packaged product,
available long before the web service was launched, and even before Mr.
Anderer's patent application was filed.
Might this ProtexIP be prior
art, or are the two unrelated (legally-speaking)? Specifically, does it look
like ProtexIP (and others that do the same thing) are infringing this Anderer
patent?
cmc
[ Reply to This | # ]
|
- Prior art? - Authored by: Anonymous on Friday, September 30 2005 @ 09:00 AM EDT
- Prior art? - Authored by: Anonymous on Friday, September 30 2005 @ 02:25 PM EDT
|
Authored by: Anonymous on Friday, September 30 2005 @ 06:59 AM EDT |
Uhm...
Blackduck
Software
Doesn't this count as prior art? Or are they connected somehow? [ Reply to This | # ]
|
|
Authored by: alextangent on Friday, September 30 2005 @ 07:44 AM EDT |
This patent is complete rubbish; the method outlined for comparison
is not
described in any detail and is not
implementable in any universe of which I am a
resident. For instance,
talking about the protection of the target for
comparison, the patent
states [0036] In these aspects of the invention,
the need to keep
the proprietary corpus confidential is paramount. Thus,
providing a one-
way transform of the proprietary corpus, using some form of
or
combination of natural language processing, machine learning, and
data
encryption, minimizes the risk of inadvertent disclosure of
proprietary
information. It is necessary that the transform be one way
(i.e.
irreversible) to protect the confidentiality of the corpus against
the
risk that the system on which the comparison is run is compromised in
some
way, or that the corpus is intercept en route.
How is the comparsion to
be performed if the target is irreversibly encrypted?
[0037] As
noted earlier, the system is not limited to comparing
source code. The system
may be adapted to compare compiled object code
as well, which is
important in case of reverse engineering or
infringement of copyrights to
software. Furthermore, the system may be
adapted to corpuses other than computer
code.
Really? Do these guys understand reverse engineering or compiler
technology at all?
[0053] Thus, for each line in the file each concept is
translated
into the power of that concept from the corpus dictionary that
was
created earlier. For example, a line containing a number of
different
concepts would be replaced by a sequance [sic] such as
2363:12:300:41, for
example, in which the numbers are the power numbers of the
concepts and
the colons are delimiters used to separate different concepts on
the
same line. I had to stop there. The rest is nothing more
than
psuedo-computing mumbo jumbo and a lot of hand waving. Perhaps someone else
has the patience to dissect this nonsense; I don't. Unfortunately, it appears
the patent examiner didn't have the patience either. [ Reply to This | # ]
|
|
Authored by: LouS on Friday, September 30 2005 @ 07:46 AM EDT |
The idea of comparing texts by taking histograms of word usage is a
very old
one in natural language information retrieval. Just for one
example
see
Lewis, D. D.; Yang, Y.; Rose, T.; and Li, F. RCV1: A New Benchmark
Collection for Text Categorization Research. Journal of Machine Learning
Research, 5:361-397, 2004.
http://www.jmlr
.org/papers/volume5/lewis04a/
lewis04a.pdf [ Reply to This | # ]
|
|
Authored by: Anonymous on Friday, September 30 2005 @ 08:04 AM EDT |
Does the process of discovery and code comparison which took place in the
SCO-IBM litigation constitute prior art?
If this patent is actually
granted, will this not limit the ability of copyright holders of Free/Open
source software to prove a claim of infringement by non-free software vendors?
-- Because evidence in such litigation will involve comparison of Free/Open
source code with proprietary source code?
Is the latter real intention
behind filing this patent?
Does comparison of closed and open sourced
code involve any kind of innovation?
Is not code comparison already done
by Free software utilities like diff? [ Reply to This | # ]
|
|
Authored by: Anonymous on Friday, September 30 2005 @ 08:12 AM EDT |
What a bunch of do-nothing get rich quick paper shufflers.
[ Reply to This | # ]
|
|
Authored by: Anonymous on Friday, September 30 2005 @ 08:15 AM EDT |
If this patent can get through the system then it's time to patent
everything!
Does not matter if it works, just write it down!
Let's
see...
I'll patent an invention that turns water into wine. I don't need to
explain the proccess, just state a few quick claims and viola! Instant
patent!
What this all boils down to is a patant overhaul. All companies have
a choice between "trade secrets" and "patents". Take Coke as and example, they
chose to keep the formula for their tasty beverage as a trade secret, if they
had gone the patent route the forumla would be common knowledge. As soon as you
file the patent the information on how to replicate/create what your are
applying a patent for should be in the patent.
You can't just write down some
jargon and say, "I just patented cold fussion!" You must state, in easily
understandable instructions, how your patent actually works. Not what it does,
but the nuts and bolts of how it does what it does. That is the trade-off when
you patent as opposed to when you keep something as a trade secret.
For those
who say, "I have to patent or I lose my legal ownership." Hogwash! If someone
acquired the forumla for Coke and tried to patent it tomorow they would not get
it as there is prior art available at your local store! The formula would lose
it's trade secret status, but that's about it. (well except for a bunch of
lawsuits and finger-pointing.)
I propose the patent office go back to the way
it used to be with complete drawings/plans/schematics required for ALL
applications. If you can not show the patent office EXACTLY how to make the
patent work (I'm not talking about a slideshow but hard
evidence/prototype/source code) then the application should be rejected out of
hand.
Without such a requirement all patents are a joke.
[ Reply to This | # ]
|
|
Authored by: Anonymous on Friday, September 30 2005 @ 08:44 AM EDT |
First thanks to stats for this gem.
Now for the other readers here I want you to cast your minds back in time.
First we have the MIT scientists, spectral analysis and code comparison.
We have references here to one way encryption to protect the closed source code.
We have a buddy of D McB behind this patent. Not first named now which might or
might not be by itself significant here.
The patent makes extensive reference to Linux and Unix.
The filing date is interesting also - 2004.
Lets make a guess at dot connection here.
Back to 2002. Yarro sees that buying the Unix divisions isnt helping. He comes
up with plan B - to use the control of Unix code to initiate a law suit against
Linux. He needs a mouth pice to sell this idea: it is never intended to go to
court note - just create a stink to get the company out of Canopy so he can make
yet another land grab for Canopy.
D McB is approached. He rings up his friend Anderer. Anderer figures that he can
code something with a few friends that will pull up a few code similarities
between Unix and Linux. There will be +ves - grep, chron etc - the question is
how many?
D McB goes back to yarro and says: "Yes we may have something here. We
[Anderer at al] develop a code comparion system that will pick out similarities
between Unix and Linux for you. I [D McB] will sell this idea for you. I return
I get to hire my buddies."
The deal is done.
The one way encryption is the Greek text stuff (remember that?). The spectral
analysis is applied to the histogrames of the word counts. (This is a genuine
technique but its interpretation is often tricky at best.)
The missing money that stats also found probably relates to this use of this new
idea by MS. MS wanted this idea for the FUD machine. The SCO deal was the first
go at making this work. If this had worked this patented idea would not be in
use on a widescale by MS to create at least more FUD.
Now we may have found why Anderer though they might be able to get more money
out of MS. If for example OO could be shown to share code with (say) MS Office
OO would be now deep in the weeds - assuming that this patent held up in court
etc. This is why the parent was never assigned to SCO: SCO was to be bought out
by IBM and the patent would have gone with it.
My guess on why Anderer is not the first named on the patent is because of his
visibility in this case. It might also relate to his contribution of course to
the patent but Im just being 'paranoid' here.
The results of this method would be convincing to a non techie - think D McB,
Laura Didot, possible M' OG & Enderle - who knows who say the results of
this programme of Anderer's. Needless to say it would pull out sections of
similar code: chron, grep, the interfaces etc which would be most impressive. So
impressive that Yarro hired D McB in fact.
D McB is no techie. Yarro is no techie. Neither are copyright lawyers. The ex
MIT CVs would look good. The mats qualification look ood. They run this past the
in house lawyers. They decide to run with it starting out with Kevin McBride (D
McB brother) in a state court where they think they have a better chance of
sucess.
Then it goes pear shaped for them. IBM want to fight. The case gets transfered.
Instead of being a cheap and easy case they have to hire expensive lawyers
(BSF). The evidence is adequate at first blush for BSF to think there might just
be a case. SCO's methods wont stand up in court so they are forced to try
manual comparisons.
MS are watching the case carefully. If the method had stood up in court or if
IBM had bought out SCO there would now be mountains of claims based on this
patent which alsmot certainly would have been bought by MS for thier own use
agaist IBM and anyone else that had dared to touch FOSS code.
This by itself IMHO justifies IBM's decison to fight this case all the way.
Otherwise they would have been sitting on a lot more trouble in the future.
--
MadScientist
[ Reply to This | # ]
|
- Mike Anderer, "MIT" Scientists, Spectral Analysis, and a Patent Application - Authored by: Steve Martin on Friday, September 30 2005 @ 08:49 AM EDT
- Thank you - Authored by: Anonymous on Friday, September 30 2005 @ 09:09 AM EDT
- meant to add - Authored by: Anonymous on Friday, September 30 2005 @ 09:31 AM EDT
- Mike Anderer, "MIT" Scientists, Spectral Analysis, and a Patent Application - Authored by: John Hasler on Friday, September 30 2005 @ 11:15 AM EDT
- Timeline - Authored by: The Mad Hatter r on Friday, September 30 2005 @ 08:19 PM EDT
- The HP Memo. - Authored by: Anonymous on Friday, September 30 2005 @ 09:59 PM EDT
- Another date - 2.2 seems to be OK just as M$ said. - Authored by: Anonymous on Friday, September 30 2005 @ 10:47 PM EDT
- Summary of a stats post on Y! - Authored by: Anonymous on Sunday, October 02 2005 @ 09:47 AM EDT
|
Authored by: rsteinmetz70112 on Friday, September 30 2005 @ 09:11 AM EDT |
This patent application seems specifically designed to buttress SCOG's claims
and to provide a basis for getting the inventors recognized as experts in the
field of software comparison. In other words the patent claims seem designed
specifically for this litigation.
I picked out this gem "it is not likely that variable names, function
names, procedure names or comments will be shared across source code files
unless they were written by the same person or unless one was written with the
knowledge of the other." This seems almost directly from SCOG's claims. It
also curiously omits the possibility of published standards, common practice,
industry conventions or shared (but not copied) experience (say everyone went to
MIT).
If these guys testify in the IBM case, I can't wait for IBM to ask for the
source code so their experts can determine the validity of the method.
---
Rsteinmetz - IANAL therefore my opinions are illegal.
"I could be wrong now, but I don't think so."
Randy Newman - The Title Theme from Monk
[ Reply to This | # ]
|
|
Authored by: rand on Friday, September 30 2005 @ 09:42 AM EDT |
..."proving" that two things are similar doesn't prove that B was copied from
A.
1. If you encrypt the data properly (squint and wear colored glasses),
oranges look identical to apples.
2. In the case of source code, it's most
probable that both A and B were copied from some prior source (Adm.
Hopper and Knuth comes to
mind). The invention might work, after a fashion, but to use it to demonstrate
that a certain piece of open-source software was copied from proprietary* code
would first require that the proprietary* code be compared to every piece of
software previously written by anyone, anywhere, in any language.
*Again we
see the usage of "proprietary" and "IP"/"intellectual property".
Remember,
"proprietary" just means someone owns or claims something, as opposed to "public
domain"; in that regard most open-source software is still proprietary
(otherwise, GPL wouldn't work!).
Remember also, there is no intellectual
"property", there are only limited rights: the right to copy (copyright), the
right of exclusive manufacture (patent), the right of exclusive identification
(trademark), and the right to protect trade secrets. --- The wise man is
not embarrassed or angered by lies, only disappointed. (IANAL and so forth and
so on) [ Reply to This | # ]
|
|
Authored by: dwheeler on Friday, September 30 2005 @ 10:05 AM EDT |
I have not had time to read this patent application, but I did look for the
original filing date. At the top it says 2004; it then references an earlier
filing of Sep. 11, 2003.
That is a strikingly interesting date. You see,
Eric Raymond released his comparitor tool to the world on Sep. 7, 2003.
See this news file. His
comparator tool seems to be prior art, so this patent is probably invalid just
on that basis. Indeed, I know that Raymond's is not the first tool to do this
(there are others far older), so this patent may be DOA.
Speculation: It may
very well be that this patent-applier read about Raymond's tool, and then
quickly wrote a patent application to cover what his tool did after the
tool had been publicly released to the world. If that is what happened, this
isn't just invalid, it's unethical.
[ Reply to This | # ]
|
|
Authored by: Anonymous on Friday, September 30 2005 @ 10:10 AM EDT |
Okay, I'll admit I didn't read the entire patent application...but, it sure
sounds like he's just taken bayesian logic to tokenize a submission and compare
it against a database of known previously tokenized data just like bayesian spam
filters do...only on code instead of e-mail.[ Reply to This | # ]
|
|
Authored by: John Hasler on Friday, September 30 2005 @ 10:25 AM EDT |
Filed: September 13, 2004
[0001] This
application claims the benefit of U.S. Provisional Patent Application No.
60/502,098, filed on Sep. 11, 2003...
I thought provisional
patents were only good for one year.
[ Reply to This | # ]
|
|
Authored by: Anonymous on Friday, September 30 2005 @ 10:47 AM EDT |
This reminds me of some past scams where somebody gets a patent on
"something" and then uses the patent to attract investors, stock
analyists, bankers, and the even less informed public.
It is a "magic asset" that is used to inflate the value of a company
that is otherwise worthless. Think perpetual motion machine.
The bsst part is that it never really needs to work -- just give the appearance
of working.[ Reply to This | # ]
|
|
Authored by: Anonymous on Friday, September 30 2005 @ 11:26 AM EDT |
Stats has an amazing ability to parse large amounts of information and pull out
really interesting bits.
If I were on IBM or Novell legal team I would seriously think about bringing
Stats in, perhaps as a contractor and let him see all the real dirt on SCO (from
discovery and subpoenas) and not just information available from public sources
alone.
And then when its all over, he could write a book with all the gory details and
I'll be the first in line to get it.[ Reply to This | # ]
|
|
Authored by: Anonymous on Friday, September 30 2005 @ 12:52 PM EDT |
I don't really care about the patent aspect of this (getting US patents seems
easier than applying for a credit card these days) but the business model that
they seem to be promoting (and which could be used even if the patent were
rejected).
Basically they seem to be building a service where anyone with a few lines of
source code will be able to submit it and have it compared to the entire bidy of
OSS work to find any 'matches'. They'll then get a report of those 'matches' and
head off down to the local court house and try playing the SCO game. Whats the
bet that you could put nearly any code at all in there and it would come up with
a 'match' of some kind. Seems like it's tailor made to create a storm of
nuisance law suites against OSS devs / distros etc. Perhaps it could even kill
some projects or distros by bogging them down in legal battles.[ Reply to This | # ]
|
|
Authored by: mac586 on Friday, September 30 2005 @ 12:54 PM EDT |
Another exciting piece of evidence that reveals the innerworkings of SCO's
"Dream Team".
When Darl and company commented on how much they desired a courtroom and a jury,
they never imagined that such a trial was going to take place online, in a
blog.
How can SCO even imagine a day in court? These days, they must be imagining
years in a federal prison.[ Reply to This | # ]
|
|
Authored by: gtall on Friday, September 30 2005 @ 01:10 PM EDT |
He replacing "terms or phrases in the corpus with concepts".
So, how would this work for, say, "i = 0"?
(1) the term "i" is replaced with the concept of bounded integer.
(2) the term "i" is replaced with the concept of an index.
(3) the term "i" is replaced with the concept of a step in an
induction (base case).
and so on. Let's forget about the mental dexterity required of one to replace a
word with a concept (quick, write down what the concept of a triangle
is...please be sure to reference Kant...now that you've written it down, are the
words you wrote the concept as a ding as sich?), there's no stopping the list of
concepts. And this is going to be done by a computer, no less, that doesn't even
understand natural language...assuming a machine understanding natural language
even makes sense.
Gerry[ Reply to This | # ]
|
|
Authored by: Anonymous on Friday, September 30 2005 @ 02:03 PM EDT |
I don't understand this.
AFAICT this is considered to be a "published"
patent application and because it's already "published" it's too late to file a
protest with the USPTO. How were we supposed to learn about this patent
and protest it BEFORE it was listed on the USPTO website?[ Reply to This | # ]
|
|
Authored by: Anonymous on Friday, September 30 2005 @ 02:13 PM EDT |
INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION
TREATY.... WIPOWhat are the implications of this? Brian S. [ Reply to This | # ]
|
|
Authored by: Anonymous on Friday, September 30 2005 @ 03:56 PM EDT |
Since the stated purpose of this application is to check proprietary code
against open source, it's only value may be to alert proprietary companies to
dangers they face from pilfered open source code.[ Reply to This | # ]
|
|
Authored by: Fredric on Friday, September 30 2005 @ 05:42 PM EDT |
I am not a patent expert by far but I have read a couple of patents and to me
this "patent application" does not look like anything I have seen. (We
have on occasion seen Microsoft patents
on Groklaw, read them and note the
difference).
First, as some people pointed out, I can find no
implementation. A patent should contain a description of
how to implement the
innovation, in this case it could be
code or possibly free text. (Disclaimer: I
could not
persuade my browser to show the images in the patent so
theoretically
they could be the description but from the text I
would expect a lot of boxes
and arrows and no algorithms).
AFAIK you can not patent an idea or concept (SCO
would
like that!) like a time-machine or the idea to create a web
site to
compare source code.
Second, the text does not look like something written
by a patent attorney. I could of course be wrong but all patents I have read
are
much more precise and does not contain phrases like
"it will be apparent to
those skilled in the art" or similar (one can easily beleive that tha author
does not
consider himself "skilled in the art" ;-). Patent applications are
usually much
more difficult to understand because there is no reasoning or
discussion and the reader is assumed to be not only skilled but an expert. This
"patent application" feels more like a lecture than a patent
application.
Anyway, this does not look real to me. I could not understand
if the patent was granted. It would really surprise me if it was, but then, I am
not a patent attorney
so I could be wrong here and still get to keep my
job.
Is there anybody around that can tell me if this really is a typical US
SW patent application or does Mr. Anderer's elevator not really reach the top
floor (if that translates).
--- /Fredric Fredricson
--------
[Funny sig temporarily removed for tests on Salisbury Plain] [ Reply to This | # ]
|
|
Authored by: rdc3 on Friday, September 30 2005 @ 08:04 PM EDT |
As described in
section 1134.01 of The Manual of Patent Examining Procedures,
the
USPTO allows a two month period for submission of
prior art in the form of
publications or patents, once
an application is published.
Up to 10
publications or patents per submission may be filed. No accompanying
explanations or highlighting is allowed. The fee is $180.
There is
thus a very limited opportunity to file
prior art to be considered by the
patent examiner.
[ Reply to This | # ]
|
|
Authored by: Anonymous on Friday, September 30 2005 @ 11:03 PM EDT |
Directing your attention to Claim 3, which I paraphrase as "searching a set
of documents for a set of keywords and generating some figure of merit for the
goodness of the match."
It would appear that Google's, and presumably every other search engine that
generates goodness figures, is now an infringing application.
And you thought that SCO was only out after the Unix-like operating system
domain. I think that they had even bigger game in their sights!
Cheers![ Reply to This | # ]
|
|
Authored by: Walter Dnes on Saturday, October 01 2005 @ 12:05 AM EDT |
- Many patents (i.e. "process patents" and "business method
patents" and "software patents") cover "methods and
concepts".
- Anderer's application covers comparison on "methods and
concepts".
- A patent search involves comparing the text (source code) of patents to look
for the same "methods and concepts".
So the granting of this patent would ultimately mean that anyone, *INCLUDING
THE PATENT OFFICE*, doing a patent search would be infringing on this patent.
Another item to remember is that the first reaction by the Open Source community
to junk patents, is a search for prior art, looking for the same methods and
concepts. *SEARCHING FOR PRIOR ART AGAINST A PATENT GRANT WOULD BE AN
INFRINGEMENT OF THIS PATENT*!!!
If this patent is granted, the US patent system ...
1) is truly b0rk3n
2) will literally be 0wn3d... by Anderer[ Reply to This | # ]
|
|
Authored by: Anonymous on Saturday, October 01 2005 @ 10:02 AM EDT |
"In this aspect, the input file may be patent claims or design specifications
containing concepts that are compared against the concepts in the source code in
the database. Thus, while the structure of the two corpuses is different (patent
claims on the one hand and source code on the other) it is still possible
according to the system of the present invention to determine whether they share
concepts in common." Declare two variables named "said" and
"aforementioned", and this brilliant natural language processing system will
conclude you violate every single patent in history.
[ Reply to This | # ]
|
|
Authored by: Anonymous on Saturday, October 01 2005 @ 11:29 AM EDT |
From the Anderer email:
"4) On the patent side for IPX, where foes that fit it. I am working with
the lawyers to get these moved from provisional to more complete in the next
week. I think it will spawn at least 3 patents. Ed and I are the inventors on
these."
ESR guessed that it referred to http://www.ipxonline.com/, "a company that
exists to help set up patent litigation". I haven't seen any link to this
company yet. And I'd hate to think we're going to assume that since the email
used the words patent, IPX and lawyers in the same sentence that we are
beholding to search engines for the bulk of our intelligence.
I'm not all-knowing either. But, wouldn't it be interesting to depose some of
the people involved and ask them if the "system for software code
comparison" was called IPX within their little clique?
[ Reply to This | # ]
|
- What is IPX? - Authored by: Anonymous on Saturday, October 01 2005 @ 01:13 PM EDT
|
Authored by: The Cornishman on Monday, October 03 2005 @ 08:38 AM EDT |
> ...if our guess that it's the same man proves to be accurate
A literal translation of Anderer from the German: "Other". It's
clearly not the same man :)
---
(c) assigned to PJ[ Reply to This | # ]
|
- thank you - Authored by: Anonymous on Monday, October 03 2005 @ 09:49 AM EDT
|
|
|
|