May 03, 2003
Reflections on the 25th Anniversary of Spam
While many only encountered spam (junk e-mail or junk newsgroup postings) in
            the mid 1990s, my research has found it goes back much further than that.
In fact, the earliest documented junk e-mailing I've uncovered was sent May 3,
            1978 -- 25 years ago last Saturday. (It was written May 1 but sent on
            May 3.) And in a surprising coincidence (*),
            just a month ago marked the 10th anniversary of March 31, 1993, the first
            time a USENET posting got named a spam.
(Note: You can hear me (briefly) interviewed about this anniversary on
            the May 2 edition of NPR's "All Things Considered.")
While many only encountered spam (junk e-mail or junk newsgroup postings) in
            the mid 1990s, my research has found it goes back much further than that.
In fact, the earliest documented junk e-mailing I've uncovered was sent May 3,
            1978 -- 25 years ago last Saturday. (It was written May 1 but sent on
            May 3.) And in a surprising coincidence (*),
            just a month ago marked the 10th anniversary of March 31, 1993, the first
            time a USENET posting got named a spam.
I learned of that first spam through a report from Einar Stefferud who read
            a history
            I prepared of the term "spam" and how the name of the canned ham became our
            name for junk e-mail. I had original set out to research the history of
            the term, but it became impossible not to research a bit of the history
            of the act.
That first spam was sent by a marketer for DEC - Digital Equipment Corporation.
            Today, you may not know DEC, since it was bought by Compaq and is now a unit
            of HP, but in those days it was the leading minicomputer maker, and its
            computers provided the platform for the development of Unix, C and much of
            the internet, to cite just a few minor events.
By 1978 the Arpanet (as the internet was then known) had already provided
            network E-mail to a large number of folks at universities, government
            institutions and universities for over 6 years. E-mail was the biggest source
            of traffic on the Arpanet. A few years prior, Dave Farber had created
            "MsgGroup," the first network mailing list. (Though Plato and other
            timesharing systems had laid the foundations for online community and
            conferencing some years before that.)
The DEC marketer, Gary Thuerk, identified only as "THUERK at DEC-MARLBORO"
            (There were no dots or dot-coms in those days, and the at-sign was often
            spelled out) decided to send a notice to everybody on the ARPANET on
            the west coast. In those days there was a printed directory of everybody
            on the Arpanet which they used as source for the list.
            The message trumpeted an open house to show off new models of
            the Dec-20 computer, a foray into larger, almost mainframe-sized systems.
This was a spam, though the term would not be used to refer to it for
            another 15 years. Thuerk had his technical associate, early DEC
            employee Carl Gartley, send the message from his account after several edits.
            Alas, at first he didn't do it right. The Tops-20 mail program would only
            take 320 addresses, so all the other addresses overflowed into the body of
            the message. When they found that some customers hadn't got it,
            they re-sent to the rest.
As you can guess there was quite a response, with (as is typical) far more
            volume of debate than actual spam. It's amusing to see that one
            future celebrity --
            a young free software guru Richard Stallman -- at first wondered why people
            were so upset about the message. He later said the mistaken placement of
            all the addresses into the body did bother him, but he gets the dubious
            honour of being perhaps the first spam defender. Of course like all of us
            he was 25 years younger and the problem was brand new.
In those days the Arpanet had an official "acceptable use policy" which
            limited it to use in support of research and education. So this message
            was a pretty clear violation, and the DCA, which ran the Arpanet, gave
            a very stern call to Thuerk's boss about the matter.
            The policy was well enough known over time that we would not see significant
            spam for many years to come after that.
More detailed history
You can read my history of the term spam and
            how it came to mean abuse of the net.
You can also just go directly to the spam and
            the reaction to it, as well as more from my recent conversation with
            Gary Thuerk. You can also go directly to Stallman's defence of spam.
This site contains a number of essays on the spam problem, which I have
            been studying for many years, trying to find solutions which don't destroy
            the core values embodied in the mail system. In spite of what some may feel,
            we wanted a extremely cheap e-mail system where anybody could mail anybody,
            which protected anonymous communication and fostered values like free speech,
            the ability to do unsolicited communication. Those are not bugs, so fighting
            spam while keeping those values, along with other core social goals,
            is a delicate task.
You can read my current best plan to end spam if
            your interests lie that way. Other essays can be found at my
            spam essay page.
Escalation of the battle
Spam fascinates me because it sits at the intersection of three important
            rights -- free speech, private property and privacy. It's also the first
            major internet governance issue (possibly in tandem with DNS) that the
            members of the internet community have been so deeply concerned with.
The reaction to it has been remarkable. By attacking something we hold
            dear, and goading us by using our own tools and resources to do it, spam
            generates emotion far beyond its actual harm, even though that
            actual harm is quite considerable.
Spam pushes people who would proudly (and correctly) trumpet how we
            shouldn't blame ISPs for offensive web sites, copyright violations and/or
            MP3 trading done by downstream customers to suddenly call for blacklisting of
            all the innocent users at an ISP if a spammer is to be found among them.
            People who would defend the end-to-end principle of internet design eagerly
            hunt for mechanisms of centralized control to stop it. Those who would
            never agree with punishing the innocent to find the guilty in any other
            field happily advocate it to stop spam. Some conclude even entire nations
            must be blacklisted from sending E-mail.
            Onetime defenders of an open net
            with anonymous participation call for authentication certificates on every
            E-mail. Former champions of flat-fee unlimited net access who railed against
            proposals for per-packet internet pricing propose per-message
            usage fees on E-mail. On USENET, where the idea of canceling another's
            article to retroactively moderate a group was highly reviled, people now
            find they couldn't use the net without it. Those who reviled at any attempt
            to regulate internet traffic by the government loudly petition their
            legislators for some law, any law it almost seems, against spam. Software
            engineers who would be fired for building a system that drops traffic on
            the floor without reporting the error change their mail systems to silently
            discard mail after mail.
It's amazing.
Dozens of anti-spam companies have sprung up in the past few years, offering
            a range of solutions including content-based filtering, blacklists,
            collaborative filtering, spamtrap detection and removal, e-stamps and
            some bulk detection. Remarkably, one new company called Habeas (trying an
            )
            is selling not a spam-blocking service, but a magic trademarked term that
            will let legitimate mailing list owners get past the collateral damage caused
            by existing spam-blocking tools. Their product is to get you past the
            spam filters.
Attempts to nail down a definition of spam seem to always end in quagmire.
            Each party to the debate seems determined to make sure that everything
            they think is spam be included in the definition, lest one spam slip through,
            but of course also keen that nothing they don't think is spam be blocked.
            Reconciliation seems near impossible.
The solutions
Here's a brief summary of some of the current active methods and proposals
            and how effective they are.
Content-based filters
Filters have a big advantage because they only need to be installed at the
            receiver. Some of the latest filtering tools, like SpamAssassin and the
            latest Bayesian algorithms are doing quite well in terms of the amount of
            spam they stop. However, they all have "false positives" which means they
            falsely identify real mail as spam, and block it. Most filters have no
            way to identify that mail was sent in bulk (the core requirement to spot
            a spam) and thus must rely on finding common patterns used by spammers.
The hand-tuned filters need regular updating by people. The learning filters
            adjust automatically but only by letting some spam through.
In terms of effectiveness, these are 2nd only to challenge/response tools.
Blacklists
There are many competing blacklists, some of strong ethics, others more
            dubious. Nonetheless all rely on blocking mail from accused or confirmed
            spammers, with debate over the standards of proof and the definitions of
            spam. Some have gone so far as to blacklist entire ISPs or even nations.
            Almost everybody who runs a mail server, it seems, has a story about getting
            on a blacklist and having to figure out how to get off, if they were able to.
Blacklists certainly do scare ISPs, and the blacklisting of open relay servers
            had, over the course of many years, done a lot to get people to close up
            their relays (at the cost of making it harder for roaming users to send
            E-mail.)
Collaborative filters
These filters, such as Vipul's Razor (now via CloudMark) rely on the first
            poor souls who get a spam reporting it to a central server. As the reports
            come in, the spam can be identified and rules can be written to block it.
            These are reasonably effective, and go after bulk, which is good. They have
            fewer false positives if done well. They are very similar to...
Spamtrap filters
These are primarily used by BrightMail Inc., which is probably the largest
            commercial anti-spam operation. Brightmail maintains huge numbers of
            addresses seeded onto spammer lists. When messages arrive, they are almost
            surely spam, and human beings look at them to derive rules to filter out and
            retroactively delete the messages. Very few false positives, but
            unfortunately reportedly only about 60-70% effective.
Challenge/Response
Dear to my heart because to the best of my knowledge, I wrote the first of
            these, a never-productized program called Viking-12. These tools know all your existing
            contacts, and when they receive mail from a new correspondent, they send
            out a "challenge" E-mail that asks the mailer to do something to confirm
            they are a real human being and not a spammer. When they do, the held
            mail is automatically delivered and they are on the good list from then on.
These tools are extremely effective; only a few spammers ever respond to the
            challenges. However, for various reasons some legitimate correspondents
            also don't response, so it is necessary to browse the list of messages that
            never got a response to quickly search for the real messages. However, they
            are few, and they usually have low spam scores when used in combination with
            filtering tools. This can get the false positive rate extremely low.
Challenge/response without scanning the non-respondents blocks anonymous
            mail.
Today several companies offer them, and there are free software projects
            like TDMA which perform this function. A number of research projects have
            developed what could be termed "Turing tests" for the challenges, to assure
            that the respondent is a human being.
ISP bulk detection
A number of large ISPs, AOL in particular, have their own spam detectors
            which rely on the fact that due to their size, they have so many addresses
            that any spam attack is sure to arrive multiple times. They can thus detect
            these and get rid of them. A good approach, but past history shows some
            nasty false positives, with ordinary mailing lists being blocked for their
            volume. One notorious case involved AOL blocking acceptance letters from
            Harvard, which were sent out as a highly desired mass mailing.
This is a worthwhile technique but needs to be done with more care. Today's
            collateral damage is too high.
Spam-banning laws
There have been may proposed anti-spam laws, and indeed around half of
            all U.S. states have such laws -- California has two! While most of
            these state laws will eventually be declared unconstitutional since it is
            important that states not have the power to regulate something as
            geography independent as E-mail, what can't be disputed is that they are
            having essentially no effect. There have been very few prosecutions under
            them, and spam levels continue to increase tremendously. Some hold out
            more hope for a U.S. federal law, however an increasing percentage of spam
            comes from overseas. Advocates hope that even overseas spam can be stopped
            by a federal law if a U.S. connection can be found. Fellow EFF board member
            Larry Lessig advocates that a law which pays a bounty to those who hunt
            down the U.S. connection on any spam without a mandatory label could do the
            trick.
Torts
There's been a fair bit of success against big institutional spammers in
            tort law. AOL and other companies have sued spamming companies using a
            variety of torts to shut them down. So far, alas, like Whack-a-moles, other
            spammers keep coming up. However, there have also been disturbing trends
            in the tort area. For example, Intel has sued an ex-employee who spammed
            Intel's entire employee base with his grievances against the company using
            a legal doctrine called "trespass to chattels." Unfortunately, the
            consequences of declaring E-mail to be trespass are even nastier than spam.
A large number of spams are already illegal, of course, amounting to
            confidence tricks or illegal selling of prescription drugs. Some of those
            laws are being used against the spammers too.
Opt-out lists
Recently, a federal do-not-call list was instituted in the USA to stop phone
            spam. Unfortunately, doing the same for E-mail
            is difficult and faces the same problems all laws would.
Hiding your address
The most common technique today seems to be hiding your E-mail address so
            that it can't be harvested by spammers. Unfortunately, by using dictionary
            attacks, they are managing to spam people who have never exposed their
            E-mail in public. I consider this desire to never reveal your E-mail
            one of the greatest damages done by spammers, so I don't view hiding as
            a great solution to the problem.
Vigilante attack
Some anti-spammers have resorted to harassment and even extra-legal
            efforts against spammers. They make a great tale to tell, but so far do
            not seem to be stemming the tide. And they have all sorts of nasty
            backbite, since they amount to sinking to or below the level of the enemy.
Up and coming solutions
E-Stamps
This idea is regularly re-generated.
            I first proposed it myself back in 1995 and later
            came to reject it. The idea is to put some low (or routinely not collected)
            cost on sending an E-mail that does not bother ordinary senders, but stops
            spam from being cost-effective. It has many advocates, and might work
            if it could be universally adopted. However, it suffers from a "you can't
            get there from here" problem. Until people are offering stamps with their
            E-mail, you can't demand them, and they have little incentive to offer them
            if nobody is demanding them. This technique could only be built by
            piggybacking on other techniques, such as doing challenge/response and offering
            stamps as a means to bypass the challenge.
Throttling bulk volume from unaccountable addresses
My current favoured proposal, detailed here.
Authentication
A number of people on anti-spam lists propose putting an authentication
            regime into E-mail, to the extent that one could refuse mail that was
            not digitally signed or otherwise verifiable. This would stop forging
            return addresses or the use of non-existent return addresses. A number
            of laws also address this.
Such schemes unfortunately abandon the longtime goal of an open E-mail system
            without central management (such as a certificate authority) which allows
            anonymous speech.
The Future
The spam problem will get worse before it gets better. Spammers will
            try new and nastier techniques to get around the blockers, and the blockers
            will try new and improved technologies. Spammers are already moving to
            even nastier techniques, such as using worm programs, or exploiting windows
            in widely deployed software systems to take over other people's machines and
            get them to do the spamming. It is rumoured that some spammers are using
            some of the wide number of open wireless networks to drive up to a building
            and spam using the network inside. Such tactics can't be countered
            with blacklists, for example, though they are fortunately highly illegal.
However the spam problem is solved, or partially solved, it will remain
            fascinating as the internet community grapples with its first serious
            abuse issue from within. Most other abuse issues have involved outsiders,
            ranging from the religious conservatives trying to ban smut to the RIAA trying
            to stop file-sharing, trying to regulate the net. Spam has caused the
            network insiders themselves to seek to regulate it.
This is important because it will, of course, not be the last such issue.
            How we manage ourselves here will be an indicator of things to come.
I hope that as we do this we will remember the principles that make free
            societies free, and the principles upon which the internet was built.
            End-to-end, open designs. The ability for anybody to communicate with
            anybody, even without an invitation. Ubiquitous, deliberately low-cost
            communications that are not accounted for on a packet-by-packet basis.
In addition, we must realize that though all internet traffic flows over
            private property, this does not mean we should forget constitutional
            principles like the U.S. 1st amendment. As I view it, the 1st amendment
            isn't just the law, it's a good idea. We owe a duty to preserve the
            values it contains -- and the long history of how to protect them that
            is embodied in 1st amendment jurisprudence -- as we architect the communications
            systems of the future. For in building and regulating the internet,
            we are doing no less than creating the primary platform for speech and
            the press of the new century.
That is not a task to be taken lightly.
Posted by glenn at May 3, 2003 07:36 PM | TrackBack