Columbia Basin Internet <-> Packet Radio Gateway
 




28 Aug, 2008 - 19:12 PST 
 
Main Menu

Online
There are 0 registered users online.

You can log-in or register for a user account here.

Long-term mystery SOLVED!
Tuesday, 06 February, 2007, 01:23 PST

Printer-friendly page

GATEWAY NEWS

For a few years, I have been having an intermittent problem with certain, infrequent Internet SMTP messages inducing TNOS crashes on ALWGW.  The occurrences have caused more than one headache, but I am glad to say that I now have a solution for it!


Before I proceed to the symptoms, diagnosis and solution, I should probably preface this by describing my SMTP layout here.  I have Postfix MTA running on the same Linux box as TNOS (version 2.40), and netfilter (iptables) is configured to force all incoming and outgoing Internet SMTP traffic to relay through Postfix.  This helps to keep the relatively fragile TNOS SMTP server from receiving all of the assorted probes and buffer overflow scripts, and redirects them to the relatively robust Postfix server instead.  I also am able to use Postfix to reject spam and Microsoft viruses, and pattern match recipient addresses for something that resembles a ham radio callsign.  Postfix is highly configurable, and I can do a great deal with it.

Symptoms

Here's what happens.  A message for one of my users arrives and is processed by Postfix, is queued up, and then delivery is attempted to TNOS.  TNOS actually receives the message and writes it to the user's mail file on the system, but does not complete the SMTP transaction before receiving a SIGSEGV from the operating system.  Consequently, Postfix keeps the message in its queue, and TNOS restarts.  After the timeout period expires, Postfix again attempts to deliver the message to TNOS, and the process repeats itself.  Over and over, every time the delivery of the message is attempted, TNOS crashes.  Each time, an additional copy of the message is written to the user's mail area before TNOS dies.  Interestingly, when listing these messages, each one has the recipient's address and the message size in the listing, but no sender, date, or subject (even though these appear in the header).  I didn't realize that this was significant until later.  In the past, I've looked at the offending messages with Postfix's postcat utility, but never really noticed anything that caught my eye.

Diagnosis

Due to an assortment of crashing problems last month, I was finally able to get core dumps out of this latest batch of segmentation violation crashes.  This was excellent news!  Running gdb on the core file, I found that a strcmp instruction was being called from within TNOS' reject.c code (line 149), where one of the string variables pointed to memory location 0x0.  When TNOS attempted to do a strcmp at memory location 0x0, the Linux kernel firmly gave it a SIGSEGV, thus the crash.

Since this was occurring in reject.c, would the offending message pass if "pbbs reject" was turned off?  Since I had two offending messages to work with this morning, I gave TNOS the "pbbs reject off" command and then  released ("unheld" with Postfix's postsuper) one of the messages causing the crashes.  Sure enough, it was delivered without causing TNOS to crash.  Progress!  However, I noticed that the message delivered looked the same on the filesystem as the others described above, that is, the message listing still left off the sender, date, and subject.

Now, why was pointer variable cmd[1] incorrectly pointing to address 0x0 instead of a location with a NULL character or something equally valid?  As I mulled this one over, and examined all of the memory contents that I could think of, I decided to review the messages again, and compare them with other similar messages that had not caused delivery problems.  It was then that I noticed a line in the message header that looked like this (in red) :

User-Agent: KMail/1.9.1
X-Face: (?K`WPum>k,$xD:^5lco~&[g7t2C%Q5tO@~cnea''dNhA2\bd"=?iso-8859-15?q?6=5D=7DHnvlOcT=  \ 3F+=5C/=3B=25TT=7DU=0A=090jAxk-?="Wt>*Xora^<,'Eykz^Ary#B"b`7TI7*Qf(-ooWi!c([h$y19t  \ (=?iso-8859-15?q?=7DAx=5C=3DVlDK=25=25w=5Cl=0A=09=5Ey/I=5F/rn=26lR?=(t#;q>sPRN9dwE|ZStatus: R
%sml&so".vsD%^>Ce7`+^t*tx*dP"=?iso-8859-15?q?=7B8rMQHP+9=7D!54M=0A=09ndN=5CPCUl=3F3=5CQ=5BUSU=5B?=)GY:feNS-m
Message-Id: <200702052247.38968.xxxxx@xxx.xxx.xx>

*Note: Most of it was all on one line, but the formatting messed up the readability of this page, so I added the carriage returns where the backslashes are located.  Also, the Message-Id was doctored to protect the innocent.

Hmm, X-Face?  I did a search on X-Face, and learned this.  I also noticed that the recipient address appeared above the X-Face line, but the sender, date, and subject all appeared below the X-Face line.  I now suspected this was significant, and decided to test my hypothesis.

Solution

(NOTE: While the solution below does in fact work, the SMTP crashing problem is now known to occur with other header lines, as documented here, where you will find a new workaround that should take care of them all.  --Updated 10 February, 2007)

In my Postfix header_checks file, I added the following regular expression line:
/^X-Face:/      IGNORE
With this instruction, when Postfix encounters a line in the header that begins with "X-Face:", it silently deletes that line as it continues to process the message.

I then turned TNOS' pbbs reject on, requeued the last held message using postsuper -r <queue_id>, and sure enough, it delivered as perfectly as could be!  The resulting message listing had all of the correct information: recipient, sender, date, size, and subject.  An examination of the delivered message also showed that the X-Face line had been stripped from the header as expected.

Ultimately, reject.c and probably smtpserv.c need to be tweaked so that they gracefully handle unexpected surprises without performing illegal instructions or attempting to access prohibited memory locations.  Some checks should be done to prevent a pointer being assigned 0x0.  For now, however, this works for me.

In a future article, I will document another observation that I made while chasing these pesky TNOS gremlins and analyzing the fallout.  Stay tuned!

Note: The information in this article is still relevant, but the solution is superceded by More TNOS SMTP crashes and a workaround.




Long-term mystery SOLVED! | Log-in or register a new user account | 0 Comments
Comments are statements made by the person that posted them.
They do not necessarily represent the opinions of the site editor.
Recent Stories

Events

<< August 2008 >>
S M T W T F S
27 28 29 30 31 01 02
03 04 05 06 07 08 09
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31 01 02 03 04 05 06

Today's Events

No Events


Upcoming Events

Mon, September 01 2008


[Search]

Login




 


 Log in Problems?
 New User? Sign Up!

firefox

Page created in 1.55246090889 seconds.