net127: Reflections on the 25th Anniversary of Spam

May 03, 2003

Reflections on the 25th Anniversary of Spam

While many only encountered spam (junk e-mail or junk newsgroup postings) in
the mid 1990s, my research has found it goes back much further than that.

In fact, the earliest documented junk e-mailing I've uncovered was sent May 3,
1978 -- 25 years ago last Saturday. (It was written May 1 but sent on
May 3.) And in a surprising coincidence (*),
just a month ago marked the 10th anniversary of March 31, 1993, the first
time a USENET posting got named a spam.

(Note: You can hear me (briefly) interviewed about this anniversary on
the May 2 edition of NPR's "All Things Considered.")

While many only encountered spam (junk e-mail or junk newsgroup postings) in
the mid 1990s, my research has found it goes back much further than that.

I learned of that first spam through a report from Einar Stefferud who read
a history
I prepared of the term "spam" and how the name of the canned ham became our
name for junk e-mail. I had original set out to research the history of
the term, but it became impossible not to research a bit of the history
of the act.

That first spam was sent by a marketer for DEC - Digital Equipment Corporation.
Today, you may not know DEC, since it was bought by Compaq and is now a unit
of HP, but in those days it was the leading minicomputer maker, and its
computers provided the platform for the development of Unix, C and much of
the internet, to cite just a few minor events.

By 1978 the Arpanet (as the internet was then known) had already provided
network E-mail to a large number of folks at universities, government
institutions and universities for over 6 years. E-mail was the biggest source
of traffic on the Arpanet. A few years prior, Dave Farber had created
"MsgGroup," the first network mailing list. (Though Plato and other
timesharing systems had laid the foundations for online community and
conferencing some years before that.)

The DEC marketer, Gary Thuerk, identified only as "THUERK at DEC-MARLBORO"
(There were no dots or dot-coms in those days, and the at-sign was often
spelled out) decided to send a notice to everybody on the ARPANET on
the west coast. In those days there was a printed directory of everybody
on the Arpanet which they used as source for the list.
The message trumpeted an open house to show off new models of
the Dec-20 computer, a foray into larger, almost mainframe-sized systems.

This was a spam, though the term would not be used to refer to it for
another 15 years. Thuerk had his technical associate, early DEC
employee Carl Gartley, send the message from his account after several edits.
Alas, at first he didn't do it right. The Tops-20 mail program would only
take 320 addresses, so all the other addresses overflowed into the body of
the message. When they found that some customers hadn't got it,
they re-sent to the rest.

As you can guess there was quite a response, with (as is typical) far more
volume of debate than actual spam. It's amusing to see that one
future celebrity --
a young free software guru Richard Stallman -- at first wondered why people
were so upset about the message. He later said the mistaken placement of
all the addresses into the body did bother him, but he gets the dubious
honour of being perhaps the first spam defender. Of course like all of us
he was 25 years younger and the problem was brand new.

In those days the Arpanet had an official "acceptable use policy" which
limited it to use in support of research and education. So this message
was a pretty clear violation, and the DCA, which ran the Arpanet, gave
a very stern call to Thuerk's boss about the matter.
The policy was well enough known over time that we would not see significant
spam for many years to come after that.

More detailed history

You can read my history of the term spam and
how it came to mean abuse of the net.

You can also just go directly to the spam and
the reaction to it, as well as more from my recent conversation with
Gary Thuerk. You can also go directly to Stallman's defence of spam.

This site contains a number of essays on the spam problem, which I have
been studying for many years, trying to find solutions which don't destroy
the core values embodied in the mail system. In spite of what some may feel,
we wanted a extremely cheap e-mail system where anybody could mail anybody,
which protected anonymous communication and fostered values like free speech,
the ability to do unsolicited communication. Those are not bugs, so fighting
spam while keeping those values, along with other core social goals,
is a delicate task.

You can read my current best plan to end spam if
your interests lie that way. Other essays can be found at my
spam essay page.

Escalation of the battle

Spam fascinates me because it sits at the intersection of three important
rights -- free speech, private property and privacy. It's also the first
major internet governance issue (possibly in tandem with DNS) that the
members of the internet community have been so deeply concerned with.

The reaction to it has been remarkable. By attacking something we hold
dear, and goading us by using our own tools and resources to do it, spam
generates emotion far beyond its actual harm, even though that
actual harm is quite considerable.

Spam pushes people who would proudly (and correctly) trumpet how we
shouldn't blame ISPs for offensive web sites, copyright violations and/or
MP3 trading done by downstream customers to suddenly call for blacklisting of
all the innocent users at an ISP if a spammer is to be found among them.
People who would defend the end-to-end principle of internet design eagerly
hunt for mechanisms of centralized control to stop it. Those who would
never agree with punishing the innocent to find the guilty in any other
field happily advocate it to stop spam. Some conclude even entire nations
must be blacklisted from sending E-mail.
Onetime defenders of an open net
with anonymous participation call for authentication certificates on every
E-mail. Former champions of flat-fee unlimited net access who railed against
proposals for per-packet internet pricing propose per-message
usage fees on E-mail. On USENET, where the idea of canceling another's
article to retroactively moderate a group was highly reviled, people now
find they couldn't use the net without it. Those who reviled at any attempt
to regulate internet traffic by the government loudly petition their
legislators for some law, any law it almost seems, against spam. Software
engineers who would be fired for building a system that drops traffic on
the floor without reporting the error change their mail systems to silently
discard mail after mail.

It's amazing.

Dozens of anti-spam companies have sprung up in the past few years, offering
a range of solutions including content-based filtering, blacklists,
collaborative filtering, spamtrap detection and removal, e-stamps and
some bulk detection. Remarkably, one new company called Habeas (trying an
)
is selling not a spam-blocking service, but a magic trademarked term that
will let legitimate mailing list owners get past the collateral damage caused
by existing spam-blocking tools. Their product is to get you past the
spam filters.

Attempts to nail down a definition of spam seem to always end in quagmire.
Each party to the debate seems determined to make sure that everything
they think is spam be included in the definition, lest one spam slip through,
but of course also keen that nothing they don't think is spam be blocked.
Reconciliation seems near impossible.

The solutions

Here's a brief summary of some of the current active methods and proposals
and how effective they are.

Content-based filters

Filters have a big advantage because they only need to be installed at the
receiver. Some of the latest filtering tools, like SpamAssassin and the
latest Bayesian algorithms are doing quite well in terms of the amount of
spam they stop. However, they all have "false positives" which means they
falsely identify real mail as spam, and block it. Most filters have no
way to identify that mail was sent in bulk (the core requirement to spot
a spam) and thus must rely on finding common patterns used by spammers.

The hand-tuned filters need regular updating by people. The learning filters
adjust automatically but only by letting some spam through.

In terms of effectiveness, these are 2nd only to challenge/response tools.

Blacklists

There are many competing blacklists, some of strong ethics, others more
dubious. Nonetheless all rely on blocking mail from accused or confirmed
spammers, with debate over the standards of proof and the definitions of
spam. Some have gone so far as to blacklist entire ISPs or even nations.
Almost everybody who runs a mail server, it seems, has a story about getting
on a blacklist and having to figure out how to get off, if they were able to.

Blacklists certainly do scare ISPs, and the blacklisting of open relay servers
had, over the course of many years, done a lot to get people to close up
their relays (at the cost of making it harder for roaming users to send
E-mail.)

Collaborative filters

These filters, such as Vipul's Razor (now via CloudMark) rely on the first
poor souls who get a spam reporting it to a central server. As the reports
come in, the spam can be identified and rules can be written to block it.
These are reasonably effective, and go after bulk, which is good. They have
fewer false positives if done well. They are very similar to...

Spamtrap filters

These are primarily used by BrightMail Inc., which is probably the largest
commercial anti-spam operation. Brightmail maintains huge numbers of
addresses seeded onto spammer lists. When messages arrive, they are almost
surely spam, and human beings look at them to derive rules to filter out and
retroactively delete the messages. Very few false positives, but
unfortunately reportedly only about 60-70% effective.

Challenge/Response

Dear to my heart because to the best of my knowledge, I wrote the first of
these, a never-productized program called Viking-12. These tools know all your existing
contacts, and when they receive mail from a new correspondent, they send
out a "challenge" E-mail that asks the mailer to do something to confirm
they are a real human being and not a spammer. When they do, the held
mail is automatically delivered and they are on the good list from then on.

These tools are extremely effective; only a few spammers ever respond to the
challenges. However, for various reasons some legitimate correspondents
also don't response, so it is necessary to browse the list of messages that
never got a response to quickly search for the real messages. However, they
are few, and they usually have low spam scores when used in combination with
filtering tools. This can get the false positive rate extremely low.

Challenge/response without scanning the non-respondents blocks anonymous
mail.

Today several companies offer them, and there are free software projects
like TDMA which perform this function. A number of research projects have
developed what could be termed "Turing tests" for the challenges, to assure
that the respondent is a human being.

ISP bulk detection

A number of large ISPs, AOL in particular, have their own spam detectors
which rely on the fact that due to their size, they have so many addresses
that any spam attack is sure to arrive multiple times. They can thus detect
these and get rid of them. A good approach, but past history shows some
nasty false positives, with ordinary mailing lists being blocked for their
volume. One notorious case involved AOL blocking acceptance letters from
Harvard, which were sent out as a highly desired mass mailing.

This is a worthwhile technique but needs to be done with more care. Today's
collateral damage is too high.

Spam-banning laws

There have been may proposed anti-spam laws, and indeed around half of
all U.S. states have such laws -- California has two! While most of
these state laws will eventually be declared unconstitutional since it is
important that states not have the power to regulate something as
geography independent as E-mail, what can't be disputed is that they are
having essentially no effect. There have been very few prosecutions under
them, and spam levels continue to increase tremendously. Some hold out
more hope for a U.S. federal law, however an increasing percentage of spam
comes from overseas. Advocates hope that even overseas spam can be stopped
by a federal law if a U.S. connection can be found. Fellow EFF board member
Larry Lessig advocates that a law which pays a bounty to those who hunt
down the U.S. connection on any spam without a mandatory label could do the
trick.

Torts

There's been a fair bit of success against big institutional spammers in
tort law. AOL and other companies have sued spamming companies using a
variety of torts to shut them down. So far, alas, like Whack-a-moles, other
spammers keep coming up. However, there have also been disturbing trends
in the tort area. For example, Intel has sued an ex-employee who spammed
Intel's entire employee base with his grievances against the company using
a legal doctrine called "trespass to chattels." Unfortunately, the
consequences of declaring E-mail to be trespass are even nastier than spam.

A large number of spams are already illegal, of course, amounting to
confidence tricks or illegal selling of prescription drugs. Some of those
laws are being used against the spammers too.

Opt-out lists

Recently, a federal do-not-call list was instituted in the USA to stop phone
spam. Unfortunately, doing the same for E-mail
is difficult and faces the same problems all laws would.

Hiding your address

The most common technique today seems to be hiding your E-mail address so
that it can't be harvested by spammers. Unfortunately, by using dictionary
attacks, they are managing to spam people who have never exposed their
E-mail in public. I consider this desire to never reveal your E-mail
one of the greatest damages done by spammers, so I don't view hiding as
a great solution to the problem.

Vigilante attack

Some anti-spammers have resorted to harassment and even extra-legal
efforts against spammers. They make a great tale to tell, but so far do
not seem to be stemming the tide. And they have all sorts of nasty
backbite, since they amount to sinking to or below the level of the enemy.

Up and coming solutions

E-Stamps

This idea is regularly re-generated.
I first proposed it myself back in 1995 and later
came to reject it. The idea is to put some low (or routinely not collected)
cost on sending an E-mail that does not bother ordinary senders, but stops
spam from being cost-effective. It has many advocates, and might work
if it could be universally adopted. However, it suffers from a "you can't
get there from here" problem. Until people are offering stamps with their
E-mail, you can't demand them, and they have little incentive to offer them
if nobody is demanding them. This technique could only be built by
piggybacking on other techniques, such as doing challenge/response and offering
stamps as a means to bypass the challenge.

Throttling bulk volume from unaccountable addresses

My current favoured proposal, detailed here.

Authentication

A number of people on anti-spam lists propose putting an authentication
regime into E-mail, to the extent that one could refuse mail that was
not digitally signed or otherwise verifiable. This would stop forging
return addresses or the use of non-existent return addresses. A number
of laws also address this.

Such schemes unfortunately abandon the longtime goal of an open E-mail system
without central management (such as a certificate authority) which allows
anonymous speech.

The Future

The spam problem will get worse before it gets better. Spammers will
try new and nastier techniques to get around the blockers, and the blockers
will try new and improved technologies. Spammers are already moving to
even nastier techniques, such as using worm programs, or exploiting windows
in widely deployed software systems to take over other people's machines and
get them to do the spamming. It is rumoured that some spammers are using
some of the wide number of open wireless networks to drive up to a building
and spam using the network inside. Such tactics can't be countered
with blacklists, for example, though they are fortunately highly illegal.

However the spam problem is solved, or partially solved, it will remain
fascinating as the internet community grapples with its first serious
abuse issue from within. Most other abuse issues have involved outsiders,
ranging from the religious conservatives trying to ban smut to the RIAA trying
to stop file-sharing, trying to regulate the net. Spam has caused the
network insiders themselves to seek to regulate it.

This is important because it will, of course, not be the last such issue.
How we manage ourselves here will be an indicator of things to come.

I hope that as we do this we will remember the principles that make free
societies free, and the principles upon which the internet was built.
End-to-end, open designs. The ability for anybody to communicate with
anybody, even without an invitation. Ubiquitous, deliberately low-cost
communications that are not accounted for on a packet-by-packet basis.

In addition, we must realize that though all internet traffic flows over
private property, this does not mean we should forget constitutional
principles like the U.S. 1st amendment. As I view it, the 1st amendment
isn't just the law, it's a good idea. We owe a duty to preserve the
values it contains -- and the long history of how to protect them that
is embodied in 1st amendment jurisprudence -- as we architect the communications
systems of the future. For in building and regulating the internet,
we are doing no less than creating the primary platform for speech and
the press of the new century.

That is not a task to be taken lightly.

Posted by glenn at May 3, 2003 07:36 PM | TrackBack

Comments

April 2004
Sun	Mon	Tue	Wed	Thu	Fri	Sat
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30