2

I would like to automatically remove repetitive message content in received e-mails.

For example whenever I exchange e-mails with one person at my company I would like to be able to filter the conversation from useless, repetitive content such like signatures and headings (From:, Date:, CC:, Subject: ...).

I have no experience with Macros/VBA and I couldn't find any tips in Outlook/VB books that I had access to. I have some experience with Python, but it's very basic (I did Learn Python The Hard Way course).

Oliver Salzburg
  • 89,072
  • 65
  • 269
  • 311

1 Answers1

0

This answer (by yours truly) provides a graphical, step by step walkthrough of how to set up a rule and program a macro that modifies incoming emails as they are received to do some processing on the body.

You would just have to adapt the code to do the processing to remove the "repetitive content" you don't want to see.

The general problem of removing repetitive content is actually rather complicated, computationally.

Assuming a "repetitive string" is a substring which has already occurred within the text, you would have a loop structure like this (pseudocode, don't try to copy this into a program):

For i = 1 To Len(str)
    For j = i To Len(str)
        needle = substring(str, i, j)
        nlen = Len(needle)
        For k = 1 To Len(str)
            match = substring(str, k, k + nlen)
            If needle = match Then
                '...do stuff
            End If
        Next
    Next
Next

Sounds pretty complex. Also, this kind of loop would catch things like "Pettitte" (a last name) and change it to "Peti" (the rest of the characters are substrings of length 1 which already occurred). You'd have to set a minimum length for the "needle" so as to avoid having at most one instance of every letter of the alphabet. Then you'd have to perform some analysis on the string to determine if it's "header text" or something which you want to remove. Otherwise, it would catch something like "you should not do that. I really, strongly advise that you not do that." and change it to "you should not do that. I really, strongly advise that you"

If you don't want to go with the general purpose (naive) way of finding duplicate content, which could delete a lot of meaningful content, you'd have to decide:

  • Which substrings to attempt to detect duplicates of;
  • Which instances of the duplicates to keep and which to delete.

The InStr and Mid functions in VBA should be helpful. Press F2 on your keyboard in the VBA editor to see the list of available functions in the various modules. The builtin string functions in the VBA module should prove useful.

I don't think anything like this already exists in a pre-canned format that you can just take and use, but if all you want to remove is redundant mail headers like From:, To:, Subject:, it should be fairly easy to detect them using a few substring or regex matches. If you get really stuck in the bowels of the code, I think a StackOverflow question would be more appropriate as a followup.

allquixotic
  • 34,882