Foray into HF

[acidfree:5017 size=800]

I have been saving my pennies (er, dollars) for almost 18 months so I could buy myself a shiny new HF base station radio without breaking the bank. I ordered stuff this last Monday in hopes that it would get here in time for my birthday. I was pleasantly surprised at how fast Elecraft processed my order. It arrived Thursday evening. I started assembly last night, worked some more this morning and afternoon and now have a fully assembled K3/100 sitting on my work table.

I was very impressed with the assembly instructions provided. They were very clear, step-by-step instructions with plenty of diagrams and pictures to make sure that nothing went amiss. I am very happy with my purchase. Now I just have to wait until Monday and Tuesday for the remaining shipments of gear and supplies to get my HF radio on the air. I am still missing my 30A power supply, coax, and antenna.

Here are some pictures of the assembly process….
[acidfree:5014 size=800]
[acidfree:5015 size=800]
[acidfree:5016 size=800]

A USB CW Keyboard

After getting my amateur radio license without having to pass a CW test, I felt a little bit cheated, so I vowed that I would learn CW and do my best to help keep it alive. After all, that is one of the two things that I remember about ham radio from my childhood (beeping and antennas). In the past year and a half, I am sorry to say that I have not yet mastered CW. But I have learned a lot about learning CW. 🙂 Baby steps, right?

Because one reason I decided to get into amateur radio was to give myself an outlet for my tinkering needs, I felt it was only fair that I should devote some of this tinker time to learning CW. How do you do that? By making a touch-sensitive paddle with an iambic keyer. This is what I set out to do about six months ago and am proud to say that I have a working finished product to share today. Much of my inspiration was from the fine folks at CW Touch Keyer. Their products were very alluring and I almost bought one of them instead of building it myself, but they didn’t meet all my requirements. (Their Master Keyer was not available yet, which I think does meet all my requirements except the actual paddle part, which you must supply yourself.)

My design goals:

  • Touch sensitive paddles
  • Act as an USB HID keyboard
  • Small
  • Variable, persistent settings
    • WPM 5-100
    • Variable sidetone frequency 100-1000 Hz
    • Various keyer modes (iambic a/b, ultimatic, bug, etc.)
    • Memories (with auto repeat)

[acidfree:5010 align=left size=400]I am happy to report that I have met these goals and more with the N7OH CW-KBD. For the low, low price of $150 you can buy the parts to build your own. I think if I had plans to make this a commercial venture, I would have to cut down on my costs. First to go would likely be the Teensy because if I swapped that out for a Microchip PIC, I could also get rid of the two capacitive touch sensors. Putting that all on a single chip with a small single board, I could certainly reduce the price some. But that is a story for another day.

I started acquiring parts for the keyboard back in the April/May time frame. I started with the basics: I needed the Teensy so I could start tinkering and get back into the AVR embedded programming mode; I needed the capacitive touch sensors so I could get a board designed and start working with them (they only came in tiny surface mount packages so I had to create a breakout board for them); I also ordered some of the other stuff I would eventually need to save on shipping later. Then I excitedly jumped into Eagle and created my breakout board. I actually created a couple of designs. Since ordering with BatchPCB has a base cost plus a per-square-inch cost, I decided that ordering a couple of different designs would not be an issue. And it turns out they sent my twice as many as I ordered (probably because the designs were so small and they had extra room that wouldn’t fit anything else.) That was a really fun process though; I have never designed a PCB before.

I don’t know how many hours I spent reading through the 408-page ATMega32U4 manual. I pulled out some old AVR code I had written in college and tried to make it work. I spent about as much time refactoring the old code as I would have spent writing new stuff. Finally I had some basic hardware support for timers, PWMs, and USB (with the help of LUFA.) From there, I moved back to the non-embedded space to try out the main portion of CW encoding and decoding. First I whipped up a program that would write out the proper timing for dits and dahs if given a string of text to type. It didn’t take very long for that, but it was much faster to have printf and instant feedback without reprogramming a device. I ported this code back to the Teensy (with minimal changes, thanks to my portable coding techniques) and was able to get a simple program up and running that would blink “hello world.” at me once a minute. I moved back to userspace and figured out how to use raw events to emulate interrupts and user timers instead of hardware timers. I extended my program to with a state machine that would read in dit and dah paddle presses and encode them into a stream of CW that can be decoded into ASCII and pushed up to the HID layer. My original state machine was too complex and introduced timing errors into the encoding, so I ditched it for this simpler version.

[acidfree:5011 align=right size=400]It took me a while to hunt down all the itty-bitty timing issues. Sometimes there were weird little hiccups in the output that I couldn’t explain. I did finally hunt them down and get smooth operation though. Then I went and filled out the big wish list of coding features (memories, keying modes and speeds, etc.) This took some time but was quite fun. I also found and fixed a few more bugs that I uncovered while I was at it. After I had the list all checked off, I still didn’t have the nerve to permanently affix all the parts. Up until now, they were all connected on a solder-less breadboard. I decided to get crazy and reduce the power consumption. It’s not like it was a pig or anything; it was already using a low-power sleep mode and was completely interrupt driven. I knew that I could reduce the 40mA power requirement with a bit of skillful coding. While it did not have any busy loops, there were a lot of wake-ups that were not needed. For example, part of the architecture is a 1ms timer that allows things to run with a 1ms accuracy. But what if nothing needs to run? It would still fire. I managed to have the things that didn’t need to run inform the timer and then have the timer shut down if there were no users. This meant than if the paddles were not pressed, it would go into a deep sleep state (<10mA) and then would wake up as soon as a paddle was pressed.

Finally, I got brave and soldered all my parts together on a prototyping board and put it in a little plastic case. I drilled holes in all the right places to allow for the connectors (power, USB mini-B, key out, paddle out, an LED, a reset button, the speaker, the volume control, and the paddles). I skillfully mounted the two aluminum paddles on a small block of wood and then cut a groove in them to make them have a solid mechanical connection to the box. I am pretty proud of the box. After I had it all assembled, I realized it was too light weight and would move around whenever I touched the paddles. I fixed this by adding some screws to the bottom so I could screw it to a plate of lexan.

[acidfree:5012 align=left size=800]The architecture of the project goes something like this:
paddles intput dits and dahs that get synchronized by the timer. Depending on the keying mode, a continuously pressed paddle may or may not continuously send dits or dahs. Also depending on the keying mode, different things may happen if both paddles get pressed at the same time. The input state machine handles all of this, resulting in a queue of dits, dahs and spaces that are ready to be consumed. The output state machine looks at the queue and sends the bits to the output pins (the buzzer and the paddle/keyer pins) as well as trying to decode the stream of dits, dahs and spaces into characters. Every recognized character gets enqueued into the HID queue, which gets sent off to the computer if it is plugged in. In addition to the two paddles, there is also a single button that can enter and exit "Command Mode." Command mode allows the user to change various parameters such as buzzer frequency, keying speed, keyer mode, paddle orientation, etc. All of these settings are saved in EEPROM, so they are persistent across power losses.

Kenwood TH-D72 and Linux

I recently found a buyer for my Icom IC-92AD, which enabled me to buy one of the new Kenwood TH-D72 radios. This is my first GPS-enabled device and a new radio to boot. I am thrilled. I got it in the mail in just enough time to scan through the instruction manual to figure out how to use it for the Monday night Beaverton CERT Net. I got on the air without any problem. The manual is not nearly as nice as the Icom manual was. First of all, they don’t give you the complete manual printed, only a getting started guide. The manual is on a CD in PDF format.

The TH-D72 has a mini-B USB connector and comes with a cable. Curious, I plugged it in to my computer and saw that it loaded the cp210x driver and gave me a /dev/ttyUSB0 device. Hooray!!! It didn’t work. 🙁 It turns out that the Natty kernel I am running has a regression in it (a story for another day). I tried out the Maverick kernel and it works just fine. So running the Maverick kernel, I was able to open up minicom, set the baud rate to 9600, and establish communication with the radio. It is NOT self discoverable. Grrr. I type in something and it gives me back ‘?’. It appears that there are two modes. With the packet12 TNC enabled, it will echo your keystrokes and give you a ‘cmd:’ prompt. If you type something wrong, it will say ‘?EH’. Without the TNC enabled, it does not echo keystrokes and will give you a ‘?’ if it did not understand the command you sent it.

Not seeing an obvious way to figure out the command set, I figured that we should try to reverse engineer it. I installed the MCP-4A program in wine. I tried to run it and it complained that it needed .NET 2.0. I tried installing dotnet20 and found that is not quite enough — it wants dotnet20sp1 or greater. dotnet20sp2 does not install. dotnet30 does not install. When I run MCP-4A with dotnet20, it throws a few errors and does not give me full use of the program (no menubar, for example), but it does run. I was able to use Wireshark to sniff the USB traffic as I performed a read and write. Then I turned to python to whip up something that can do this natively. This is what I have so far:


#!/usr/bin/python
# coding=utf-8
# ex: set tabstop=4 expandtab shiftwidth=4 softtabstop=4:
#
# © Copyright Vernon Mauery, 2010. All Rights Reserved
#
# This is free software: you can redistribute it and/or modify it
# under the terms of the GNU Lesser General Public License as published
# by the Free Software Foundation, either version 3 of the License, or (at
# your option) any later version.
#
# This sofware is distributed in the hope that it will be useful, but WITHOUT
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
# FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
# License for more details.
#
# You should have received a copy of the GNU Lesser General Public License
# along with this software. If not, see .

def command(s, command, *args):
cmd = command
if args:
cmd += " " + " ".join(args)
print "PC->D72: %s" % cmd
s.write(cmd + "r")

result = ""
while not result.endswith("r"):
result += s.read(8)

print "D72->PC: %s" % result.strip()

return result.strip()

def l2b(*l):
r = ''
for v in l:
if type(v) is str:
r += v
else:
r += chr(v)
return r

def bin2hex(v):
r = ''
for i in range(len(v)):
r += '%02x '%ord(v[i])
return r

def bin_cmd(s, rlen, *b):
if b is not None:
cmd = l2b(*b)
else:
cmd = ''
print "PC->D72: %s" % cmd
s.write(cmd)
result = bin2hex(s.read(rlen)).strip()
print "D72->PC: %s" % result
return result

def usage(argv):
print "Usage: %s " % argv[0]
sys.exit(1)

if __name__ == "__main__":
import serial
import sys

if len(sys.argv) < 3:
usage(sys.argv)

s = serial.Serial(port=sys.argv[1], baudrate=9600, xonxoff=True, timeout=0.25)

#print get_id(s)
#print get_memory(s, int(sys.argv[2]))
print command(s, 'TC 1')
print command(s, 'ID')
print command(s, 'TY')
print command(s, 'FV 0')
print command(s, 'FV 1')
print bin_cmd(s, 4, '0M PROGRAMr')
s.setBaudrate(57600)
s.getCTS()
s.setRTS()
of = file(sys.argv[2], 'wb')
for i in range(256):
sys.stdout.write('rfetching block %d...' % i)
sys.stdout.flush()
s.write(l2b(0x52, 0, i, 0, 0))
s.read(5) # command response first
of.write(s.read(256))
s.write('x06')
s.read()
print
of.close()
print bin2hex(s.read(5))
print bin2hex(s.read(1))
print bin_cmd(s, 2, 'E')
s.getCTS()
s.setRTS()
s.getCTS()
s.getCTS()
s.close()

You run it like this:

$ python thd72.py /dev/ttyUSB0 d72-dump.dat

Unfortunately from what I have seen, two consecutive reads without any changes on the radio seem to have very big differences. It is as though some of the chunks of the file are rotated or shifted by a few bytes (and the shift is not constant throughout). Not seeing an immediate reason for this, I suspect that it is some form of obfuscation. Call me a pessimist.

I will continue to work on this, but I would love to see what others in community are doing as well.

Update:

I forgot to mention that the whole point of this exercise was to find a way to work it into CHIRP. I am currently working on a driver for this radio to enable it in CHIRP. And as I was looking over the tmv71 code in CHIRP, I noticed that I should be reading a response to the read block command _before_ I actually read the block data. This seems to help things out a bit (and I modified the above code to match).

Roasted Cranberry Sauce

One of Oregon’s food exports is the cranberry, accounting for about 5% of the nationwide harvest. This year, we bought some local cranberries to make into cranberry sauce. My first time ever. Historically, it was Dad’s job (which he relished a lot) to make the cranberry sauce. His was usually ground up with oranges, sugar and nuts. I always tried some, but never LOVED it. But when I saw Nick post a Roasted Cranberry Sauce recipe, I had to try it. Since I don’t have any Triple Sec around, I modified it a bit, but it turned out great. I made it last night so it could mellow in the fridge overnight before the big day. And I daresay, this is the best cranberry sauce I have ever had. Sorry, Dad.

Roasted Cranberry Sauce with Candied Pecans (Adapted from Macheesmo, where it was pretty heavily adapted from a Bon Appétit recipe)
Makes about 3 Cups, easy to double or triple though.

Cranberries:

  • 1 pound fresh cranberries
  • 1 Cup sugar
  • 3 Tablespoons neutral oil
  • 1 Tablespoon fresh rosemary, minced
  • 1 Teaspoon fresh thyme, minced (or 1/4 t. dried)
  • 1 Teaspoon fresh sage, minced (or 1/8 t. sage powder)

Sauce:

  • 1/8 Teaspoon cinnamon
  • 1/3 Cup orange juice
  • 1/4 Cup sugar
  • 1/2 Cup currants (you could sub raisins, but chop them roughly so they aren’t so big)
  • Pinch of salt

Pecans:

  • 1 Cup pecans, roughly chopped
  • 2 Tablespoons water
  • 1/4 Cup sugar

Instructions:

  1. Mix cranberries, oil, 1 C. sugar, and herbs together in a bowl. Roast at 425°F for 20 minutes, stirring after 10 minutes.
  2. While cranberries are roasting, mix sauce ingredients in a medium sauce pan and simmer for 10 minutes.
  3. Remove cranberries from oven and add to the sauce pan. Simmer for another 2-3 minutes.
  4. While simmering the sauce, mix the pecan ingredients and spread on a baking sheet and roast for 8-10 minutes at 425°F.
  5. Remove nuts from the oven and stir as they cool. Place cooled pecans in an airtight container.
  6. Chill sauce overnight in fridge. Serve heated or chilled, topped with pecans.

Tame your bash history

I am a packrat, but I do like a bit of order. This makes maintaining my bash history difficult. There are some commands that I use frequently that seem to fill up my history file making it hard to keep some of the lesser used, yet very important commands in the history. Finally sick of the problem, I poured over the manpage for bash and found the section on HISTCONTROL. From the description there, I found that this along with HISTIGNORE, I can almost eliminate my problem of my bash history getting too full of stupid common commands.

I added this to my ~/.bash_profile:


export HISTIGNORE="&:ls:[bf]g:disown:cd:cd[ ]-:exit:^[ t]*"
export HISTCONTROL=ignoredups:ignorespace:erasedups
export HISTFILESIZE=2000

Here is the snippet from the bash manual that corresponds to these controls:

       HISTCONTROL
A colon-separated list of values controlling how commands are saved on the history
list. If the list of values includes ignorespace, lines which begin with a space
character are not saved in the history list. A value of ignoredups causes lines
matching the previous history entry to not be saved. A value of ignoreboth is
shorthand for ignorespace and ignoredups. A value of erasedups causes all previous
lines matching the current line to be removed from the history list before that
line is saved. Any value not in the above list is ignored. If HISTCONTROL is
unset, or does not include a valid value, all lines read by the shell parser are
saved on the history list, subject to the value of HISTIGNORE. The second and sub-
sequent lines of a multi-line compound command are not tested, and are added to the
history regardless of the value of HISTCONTROL.
HISTFILESIZE
The maximum number of lines contained in the history file. When this variable is
assigned a value, the history file is truncated, if necessary, by removing the old-
est entries, to contain no more than that number of lines. The default value is
500. The history file is also truncated to this size after writing it when an
interactive shell exits.
HISTIGNORE
A colon-separated list of patterns used to decide which command lines should be
saved on the history list. Each pattern is anchored at the beginning of the line
and must match the complete line (no implicit `*' is appended). Each pattern is
tested against the line after the checks specified by HISTCONTROL are applied. In
addition to the normal shell pattern matching characters, `&' matches the previous
history line. `&' may be escaped using a backslash; the backslash is removed
before attempting a match. The second and subsequent lines of a multi-line com-
pound command are not tested, and are added to the history regardless of the value
of HISTIGNORE.

The Perfect Spammer

Going through the daily blogroll this morning, I came upon a comic that could not be more true. A while back I posted on Combating Spambots. Since I implemented that anti-spam scheme, I have not seen a single piece of spam come through. It is beautiful. XKCD has posted an alternative method, which really would work if you had an active community around your website. The perfect spammer could get through, but I think I would be okay with that.

Spicy Mediterranean Chili

Recently I participated in a chili contest and this is the recipe that I came up with. I was feeling like making something different than your standard run-of-the-mill chili, so I went with a Mediterranean theme. I was hoping to win the hottest chili award, but I did not. But talking with the two others chefs at the contest who had the other two spiciest chilies, I think it was agreed on that mine was the spiciest. I think that some people just tried two chilies, and voted for the spiciest of those two or something. The one that won was not even remotely spicy. I think it was rigged. Anyway, I digress. I had it all made up and ready to go, but it was lacking a little depth in flavor, so I added a handful of Guittard dark chocolate. That did the trick. It turned it from a rosy red to a nice brown and gave it that je ne sais quoi I was hoping for. Chocolate and chilies are best of friends, right? My chili recipe from last year was not good enough to repeat, but this one most definitely is. If this isn’t hot enough for you (on a scale of one to habenero, I give it a four), you can always add a habenero or even just some hot chili sauce.

Spicy Mediterranean Chili

Original recipe by Vernon Mauery

Ingredients

  • 1 clove garlic, minced
  • 1 large onion, diced
  • 2 carrots, grated
  • 4 jalepeno peppers
  • 3 cherry bomb peppers
  • 3 sweet banana peppers
  • 2 large (or 3 medium) red bell peppers
  • 4 medium tomatoes (about 1 lb.)
  • 2 C. prepared garbanzo beans (~3/4 C. dry)
  • 2 C. prepared black beans (~3/4 C. dry)
  • 1.5 oz. dark chocolate (or bakers chocolate)
  • 1 lb. bone-in lamb shank
  • 1 tsp cumin
  • 1 tsp ground mustard
  • salt
  • water
  • olive oil

Directions

T-8 hours:
Rub salt on lamb shank and place in crockpot

T-4 hours:
Wash tomatoes and skin them. The easiest way to do this is to blanch them or roast them until the skins come off nicely. Dice them and add them to the crockpot.

Wash peppers and then roast them. The bigger peppers will take longer. Make sure that most of the pepper is charred or the skins will not come off easily. As they come off the grill (or broiler), place them in a covered container to continue to steam themselves. With protective gloves on, skin and core the peppers. You can keep or discard the seeds as desired. Dice the all the peppers except the bell peppers and add them to the crockpot.

Add about one cup of water to the grated carrots in a saute pan. Cover and simmer over medium heat for about 20 minutes. Remove lid, add bell peppers and continue to cook until most of the water has evaporated. Pour mixture into a food processor and puree. Add puree to crockpot.

Add diced onion to the saute pan with about 1 T. olive oil. Saute for 4 minutes over medium heat. Add the garlic and saute for 1 more minute. Add mixture to crockpot.

Add the prepared garbanzo beans, with their water to the crockpot. Discard the water from the black beans and rinse them before adding to the crockpot.

Add cumin, mustard. Add salt as needed (depending on the saltiness of the beans and how much salt was on the meat.)

T-1 hour:
Pull the meat out and coarsly shred it with two forks. Put meat and bone back into the crockpot.

Add chocolate and stir until melted and dispersed throughout. Adjust spices as desired.

Serving Suggestion:
Serve with pita bread, chopped green olives, and feta cheese.

The Emperor’s New RF Exposure Calculator

It has been twelve days since I made my RF Exposure Calculator available for all to use. I admit that there were a few bugs in it when I first released it. But nothing that didn’t get fixed within a day or two. You see, it being open source and all, I figured I should release early and release often. So what you see today is about 26 commits newer than the original.

I just can’t believe that it was my own naïveté that expected a warmer reception to the ham world. I mean, there are no other RFE apps that can even come close to how cool mine is. And I am not just saying that to toot my own horn. All the other applications make you type in numbers and information time and time again. For each little change you have to type new stuff in again. And they don’t remember what you typed in yesterday. Come on folks, get on the Web 2.0 bandwagon already (or something buzz-wordy like that). I got some positive feedback, for which I am very thankful. (This mad machine runs on props!) But I also got a bunch of “I don’t get it,” and “Where is the program? – All I see is a tabbed help page!” or “nada”. All I have to say to you folks is RTFM!

The grand old story of The Emperor’s New Clothes comes to mind. I wrote this awesome RF exposure calculator that only works for smart people. So if it doesn’t work for you, well… sorry. Only I am not really that sorry. I mean it would work for you if you could only read. I designed it so it would start with help text showing if there was nothing else to display (thus the tabbed help), which TELLS YOU EXACTLY HOW TO USE IT! GAAAAAHHHHHH!

Okay. That felt good. And really, this post was half therapy for me and half directed right at the anonymous coward who says that “Blogs are the verbal equivalent of vomiting!” with reference to my blog. This barf’s for you.

Radio Frequency Exposure (RFE) Calculator

So far in my amateur radio career, I have not been able to offer much that may be of use to other hams. That changes today. A while back, when I was dreaming about where to put my antennas safely, I did a lot of research about radio frequency exposure. I poured over OET Bulletin 65, which details the FCC’s limits on human exposure to RF electromagnetic fields. They have formulas and tables and forms to fill out. It is all wonderful and fine, if you live in the 1960s. Welcome to the 21st Century. We live in a world of computers to do all that number crunching for you. I looked around for any web-based things that would help, but the closest I could find was power density calculator written by W4/VP9KF. This is fine if you want to do it for EVERY band on EVERY transmitter each time you make a change to your station. Plus, it means that I have to transmit all that data to his PHP script, which does the calculations and sends them back. We have this great thing in web browsers called JavaScript, which is more than powerful enough to do the work. I set upon creating a JS-only version of his creation. But it still lacked the memory—I would still need to re-enter for each band for every change. And it wouldn’t let me view multiple bands at once. Bigger calculator!

This is where my offering steps in. My requirements:

  1. Save my data so I don’t have to re-enter everything in every time
  2. Something I can share with others, without saving their data on my server
  3. Let me add, edit, delete at will
  4. Something that can show all my transmitter/antenna/connection information at once

Seems easy enough, right? It was the first two that really got me stuck. I whipped up a little JavaScript ditty that fulfilled number four in very little time at all. Number three was dependent upon the first two and was technically the hardest, but once I had the first two figured out, it was only coding, which I enjoy.

And this is what I came up with: N7OH RFE Calculator. Take it for a spin, share it with your friends. Upon your initial visit, it may not look like much, but if you move over to the “Import/Export” tab, you can press the “Reset to sample data” button and see it in action. Please offer suggestions and comments if you find it to be too difficult to use or see something that might make it better.

As for fulfilling my four requirements, the first two were done once I learned about local storage with HTML 5. This means that your web browser is storing the data. Not as a cookie, but similar. Cookies get sent back to the server with each request. Local storage is meant to be persistent data that a web page can access via JavaScript to be used locally. This means I can save my data on my machine and your data on your machine. I can host the page for everyone, yet not save everyone else’s data on my server. The add/edit/delete requirement was probably the most fun I have had with jQuery to date. And I hardly scratched the surface of what it can do. Lastly, the glory of the Results tab just makes me weak in the knees. Okay, not really, but it is the crown jewel of the whole application. It shows all the stuff you want to know about your radio setup.

Combating SpamBots

The war against spam is ever escalating. Two weeks ago I took my anti-spam tactics to the next level. I want people to be able to post comments to my website without registering. Anonymous comments (or rather unverified authors of comments) should be available if the webmaster sees fit. But I have found that in the past several months that comment spam was getting to be a real problem. I logged in one day and found that there were several hundred spam comments that had gone unnoticed for quite some time. At that time, I did not have any anti-spam measures. I looked around and added a CAPTCHA to the comment form. That stopped most of the spam, but the determined spammers were still getting through.

IP addresses in failed CAPTCHA log Number of failed CAPTCHA responses
212.117.164.8 514
219.252.44.66 250
125.64.96.21 160
69.46.27.117 158
61.145.121.124 138
206.224.254.5 111
212.116.220.224 78
203.77.204.82 78
203.198.126.43 73
211.115.75.169 72
221.214.27.252 69
192.104.18.19 60
218.25.99.135 60
212.116.220.154 54
200.123.147.169 54
95.154.242.207 52
89.233.152.157 50
2700+ other unique hosts <50 hits per host

In the past 2 months, I have logged more than 14,000 failed CAPTCHA attempts. Most the unique hosts have one or two failures, but more than 1,000 unique IP addresses have four or more failures. At some point you have to draw the line and I draw it at four. Or maybe three. One or two failures can easily be done even if a bona fide person is responding. But usually only spambots are dumb enough to get more than three failures.

I can characterize the failures and many of them seem to be of a certain forms: hit twice in rapid succession and then give up for a while. Two hits alone is not usually successful — it usually guesses an empty string or 0 or 1. The problem is if you are using a math CAPTCHA, those can be the right answer. And obviously, if the spambot keeps at it two at a time, it will eventually guess correct and be able to post. I found that the spambot was able to crack several of the CAPTCHAs I offered: ReCAPTCHA, math, word list, word order, etc. Other than ReCAPTCHA, the other ones can be cracked by random entries. I am not sure how they managed to crack ReCAPTCHA. But it was starting to make me angry at all the spam. Finally, in addition to CAPTCHA I resorted to using comment moderation, requiring me to log in and manually approve all comments. I really don’t like this because sometimes I forget. Then the comments get old and people think I don’t care.

I did a little hunting around the Drupal front and found Mollom. This is a nice line of defense against spam. But I read elsewhere that in some cases it wasn’t catching it all. Remember that spambots are in it for the speed and money, so their GET to POST times are very short. I whipped up a little module that checks that. All you super-human typists had better slow down when commenting on my forms. Then I took a page out of Ignacio Segura‘s book and added a honeypot to the comment form to my little module as well. Though you will not see it, (unless you are looking at the html source, reading with a non-CSS compliant browser like lynx, or are a spambot) it is meant to be left empty and will cause a form rejection if it has any text in it.

Then one step more. Because what is escalation if you are not really accelerating? I noticed that once spambots did get in that they usually were ‘advertising’ for companies of ill repute. Offering things like p1Lz and other items to EnH4Nc3 certain parts of one’s body. But in order to get around blacklists for certain words, they intentionally misspell what they are advertising for and also have links to obscurely named domains (which are usually not words either.) I figured any rational thinking human being would spell at least 75% of their words correctly (and that includes things like spambot and acronyms and other non-English shortcuts). So my latest addition to the spam warfare is PHP’s pspell library. So all you spammers out there had better spell it right.

[acidfree:5001 size=800]Then as the final blow to spammer (and bad spellers everywhere) I added a “three strikes and you are out” gotcha where if you fail the previous tests more than a given number of times, you will get added to the blacklist. All entries in the blacklist are forbidden to access any part of the website. Permanently. And it seems to work. I have not seen any spam get past the filters in the last two weeks that this has been in effect. Let’s hope this lasts.

I was curious about the actual counts of things, so I whipped up a few SQL queries that gave me the statistics that I wanted. I pushed it all into OOo and came up with this fine chart. There are a couple of things to note:

  • This is about a month of data.
  • The yellow line (number of daily comment spam posts) is on the scale to the right. The other two lines are on the scale to the left.
  • The first day I tried all this stuff out (29 Jul) I didn’t actually have the blacklist implemented, which accounts for no HTTP/403 entries on that day
  • There has been zero comment spam since 29 Jul. It is not for a lack of trying.
  • The blue line shows the number of newly recognized SpamBot IP addresses.
  • The red-orange line shows the number of attempts from previously identified SpamBots that got rejected by the blacklist.
  • I find if quite funny that the HTTP/403 line looks like my server is flipping the bird at the SpamBots. That’s what it is doing…. And no, I did not doctor the data.
  • I see that there seem to be trends or waves of spam. That is fascinating and frightening all at the same time.

Do you do anything to combat spam on your sites? Obviously comment moderation is the only truly perfect filter, but it requires so much work. Especially when I really don’t get that many human comments per day, but loads of spam attempts.

Today ends with Vernon: 15, SpamBots: 0.