2

What's the best way to train SpamAssassin during continuous operation? I have a script running that calls sa-learn everytime a user moves an email in or out of the Spam folder on my Dovecot IMAP server. However, this happens 99% of the time when a spam email wasn't detected by SpamAssassin and the user manually moved it to the spam folder. Ham is only trained when a user finds a misclassified email in the spam folder and moves it back to the inbox.

Over time, this leads to vastly more spam training data than ham data. I am worried that this limits the effectiveness of SpamAssassin.

  1. Is this the case? Will asymmetric training affect SpamAssassin's effectiveness?
  2. What's the best way to solve this? Run a regular script (cron-job or similar) on the inboxes of users? How to make sure that the inbox in question is good training data?
  3. Is there any way or project that tackles this problem already, or do I need to do this manually?

Please note that this question concerns automatic retraining during operation after the initial training, so I am not asking about initializing the classifier with good training data.

arnuschky
  • 171

0 Answers0