Author |
|
Loic Newbie
Joined: 29 November 2012
Online Status: Offline Posts: 5
|
Posted: 08 January 2013 at 7:12am | IP Logged
|
|
|
Hi !
For unit tests. I try the following statements :
1. create 10 MailMessages and train the antispam to detect them as spam => OK
2. create 10 MailMessages and train the antispam to detect them as NON spam => OK
3. test the score of each mails => OK (10 spam & 10 non-spam detected)
4. Train the antispam with the 10 MailMessages marked as NON spam to detect them as spam => OK
5. test again the score of each mails => PROBLEM, no mails are marked as spam.
I don't reload the antispam during the test.
I just use : TrainFilter, SaveDatabase and ScoreMessage.
What is the logic behind training the antispam ?
|
Back to Top |
|
|
Igor AfterLogic Support
Joined: 24 June 2008 Location: United States
Online Status: Offline Posts: 6104
|
Posted: 08 January 2013 at 11:59pm | IP Logged
|
|
|
Bayesian filter itself is probability-based, which is why it is usually trained with hundreds of mails for decent spam detection results. We have no idea what will the spam score be if you train the spamfilter with the same messages for both "Spam" and "Not Spam" sides, but we might assume the number of non-spam mails has been set to zero that way, and filter needs both spam and non-spam training of course.
Also, I wonder if doing SaveDatabase and LoadDatabase before step 5 changes anything.
One more thing: if you keep getting score exactly 50, this would indicate there's some kind of problem with the database.
We might be able to help you further on this, if you provide us with ZIP of the sample messages so that we could replicate the situation here and see if anything goes wrong.
--
Regards,
Igor, AfterLogic Support
|
Back to Top |
|
|
Loic Newbie
Joined: 29 November 2012
Online Status: Offline Posts: 5
|
Posted: 09 January 2013 at 6:19am | IP Logged
|
|
|
I think my test is irrelevant. Test a probability-based algorithm is too random.
|
Back to Top |
|
|