Shiny yak: shave all HTML email!

The year is 2012. Some curmudgeonly persons that missed the memo still refuse to embrace web 2.0 and hold on to their beloved, but old-fashioned, fetchmail + procmail + mutt + emacs + msmtp setups for email processing. Your humble correspondent is one of these persons. This blog thing (how quaint) is obviously an evidence.

At some point in the past, I was so curmudgeonly that my email signature was this:

sigs waste bandwidth.

If you're in the above-stated curmudgeonly group, you might even recall this blast from the past:

:    /"\
:    \ /    ASCII Ribbon Campaign
:     X   against HTML email & vCards
:    / \

That war was valiantly fought, and lost. (It seems that the old warriors have silently admitted defeat and fatigue. Maybe winning the war against vCards was a reasonable compromise.) One reason would be that most email clients (notably Gmail's web interface and Outlook) chose to ignore the outcry and their default is to compose HTML email. Another reason would be that bandwidth and disk space are not expensive and scarce anymore. Yet another reason would be that, unlike days of the olde, majority of email users are not nerds that care about this stuff.

This however bothers our curmudgeonly person, who has faithfully subscribed to a number of old-fashioned mailing lists that are (of course!) is affected by the the "problem" of HTML email. (Dealing with HTML email from friends and associates was enough, but you'd expect better from fellow nerds in bloody mailing lists. But alas.) On confronting this nasty problem everywhere, our curmudgeonly person would think: "I know, I will unleash the might of procmail on these bastards!"

Now he has a buttload of butt-ugly problems. You've got to pay attention to this anyway, because this is a kinda-sorta shiny yak.

What didn't work

One trouble with HTML email is that it is usually really composed of two parts: a text/plain part, and a text/html part. With procmail, it should be fairly straightforward to add a filter to get rid of the text/html part, but it wasn't. I tried a couple of things that didn't work:

  • Use pm-jamime-kill.rc from procmail-lib: this didn't work since it depends on mimencode which is not available in Debian any more. See, even those curmudgeonly Debian people have given upon you!
  • mimefilter would suit old-fashioned curmudgeonly mailing list admins. It scrubs the offending email clean of text/html part before forwarding it to a list, and sends a stern warning to the originator of the email. I might be curmudgeonly but I'm not a mailing list admin, so I don't want to warn anyone. I just want "cleaner" email. There's no configuration parameter to change this behavior, and I wasn't inclined to hack on the script.

What did work

The solution was finally found in a little script found in an old Perlmonks.org post. (The title of that post, Strip Brain-Damaged Mails of "HTML Alternative" Evilness!, is perhaps reflective of global nerd community's sentiments of the time.) I saved the script, and stuck this to my ~/.procmailrc:

CLEANMAIL=$HOME/bin/clean-mail.pl

:0
 * ^((List-Id|X-(Mailing-)?List):(.*[<]\/[^>]*))
{
    LISTID=$MATCH

    :0:
    * LISTID ?? ^\/[^@\.]*
    | $CLEANMAIL >> $MATCH
}

This will "automagically" sort mailing list mail into folders by name of the list, while cleaning the suckers of any evil text/html. (There's a bit of Org-mode fail in there: please ignore the ORG-LIST-END-MARKER line if you see it.)

Conclusion

You might be inclined to make fun of Perl (and procmail) because (a) you're too cool to use this old ugly hippie proletarian stuff, or (b) your complain that Perl and the such it ignores advancements made in programming language research over the last twenty or so years, or (c) you tend to compare ParrotVM to Duke Nukem Forever, or (d) all of the above.

But I've got to advocate: it works, and it works flawlessly, for certain values of "works".

It's a shiny enough yak.