User defined regexp for culling the reply portion of an email

Kalle Raita's Avatar

Kalle Raita

09 Jul, 2013 08:55 AM

I'd like to hide the reply part of the email, i.e., the part that just repeats the previous discussion. This is a problem for us as our support staff and customers want to do the natural thing and reply directly to the original emails.

I understand that detecting the reply in general case is a hard nut to crack, but I think that with right tools I could customize myself a 90% solution.

I should be able to craft regexps suitable for detecting the start of the reply part of our employees and often serviced customers. Additionally, if the UI would put the reply part behind an expand button, instead of completely erasing the reply part, there would not be any data lost in case the user gets the regexps wrong.

It might also be helpful to have some magic tokens available like [name] that would match any name (first or family) extracted from the email headers.

If you can cook up a beta version, I'd be happy to give it a trial.

Yours,
  - Kalle

  1. Support Staff 1 Posted by Courtenay on 09 Jul, 2013 10:08 AM

    Courtenay's Avatar

    We actually already use a state machine to parse our emails and remove the reply part - regexes on their aren't complex enough. You sort of have to build a tree and traverse back and forth holding state, comparing to see if, say, someone's forwarding an email reply, or whatever.

    It works pretty well, too; if there are some cases that aren't being handled we can just add those. Usually any case that aren't handled are from a weird mail client or european date format.

    Incidentally you can add whatever custom JS you want to your ui; there's a big text field for it in your settings, so you can hide things according to whatever you like.

  2. 2 Posted by Kalle Raita on 09 Jul, 2013 11:12 AM

    Kalle Raita's Avatar

    Hi Courtenay,

    Thanks for the speedy response. If I understood your reply correctly, the reply part should normally be stripped. This is not the case for the majority of our emails going through the Tender. Could you take a look if you could add a sprinkle of mail client & date format magic based on our DB, please?

    Yours,
      - Kalle

  3. 3 Posted by Julien on 09 Jul, 2013 05:34 PM

    Julien's Avatar

    Hi Kalle,

    The reason most of your emails are not stripped correctly is because you are not replying to tender notifications, but emailing clients directly. Our notifications contain clues to help us find the reply part correctly. We could do a better job at parsing replies to normal emails though.

    As I asked in the other thread, is there any reason you are not replying to Tender notifications, and prefer to email users directly?

    Thanks.

  4. 4 Posted by Kalle Raita on 15 Jul, 2013 01:06 PM

    Kalle Raita's Avatar

    Hi Julien,

    Sorry for the delayed reply. As I just explained in the other thread, our team would prefer the Tender take the role of back seat observer, instead of a nexus for generating emails.

    Were you able to pick up any hints for improving the reply parsing from our DB, perhaps?

    Yours,
      - Kalle

  5. 5 Posted by Julien on 15 Jul, 2013 08:46 PM

    Julien's Avatar

    Hi Kalle,

    I just deployed a fix for this. If you see comments that have not been parsed correctly (ie reply part not removed), can you post them here? I'll take a look.

    Thanks!

  6. 6 Posted by Kalle Raita on 17 Jul, 2013 01:48 PM

    Kalle Raita's Avatar

    Hi Julien,

    The fix improved at least the handling for one of our supporters. Thanks!

    I don't know what's your take on detecting these blocks as a start of a reply-repeat:
    <clip>
    From: John Smith [mailto:[email blocked]]
    Sent: 17 July 2013 11:20
    To: Jane Doe
    Cc: [email blocked]
    Subject: Re: Release feedback
    </clip>

    Yours,
      - Kalle

  7. 7 Posted by Kalle Raita on 22 Jul, 2013 09:02 AM

    Kalle Raita's Avatar

    How does the reply detector respond to this general pattern?

    What color is the rose?

    > foo
    >
    > bar
    >

    -- IMPORTANT NOTICE: <very formal language>

  8. 8 Posted by Julien on 22 Jul, 2013 06:16 PM

    Julien's Avatar

    Hey Kalle,

    It should not be a match. I specifically look for signatures in the format:

    --
    some text
    

    so this doesn't fit:

    -- some text
    

    We can always improve the regex, but we also need to be careful of false positives, so I'd rather not make it match to much stuff.

  9. 9 Posted by Julien on 25 Jul, 2013 05:29 AM

    Julien's Avatar

    Hi,

    We had some issues with email notifications over the weekend and its possible you missed some of our replies to this discussion. You can see the full discussion online by clicking the link 'view this discussion online' in the grey footer at the bottom.

    We apologize for the inconvenience.

    Let us know if there is anything else we can do for you.

  10. 10 Posted by Julien on 25 Jul, 2013 05:36 AM

    Julien's Avatar

    Hey Kalle,

    Sorry for the automated reply, I had a lot of discussions to go through.

    Let me know what you think regarding signatures.

    Cheers.

  11. 11 Posted by Kalle Raita on 26 Jul, 2013 07:19 AM

    Kalle Raita's Avatar

    Hi Julien,

    I can understand that pattern "[line start]-- foo" is not the safest trigger.

    The legalese barf unfortunately prevents the obvious reply part from being hidden.

    Without knowing anything about your system internals:
    a) Would it be possible to track senders and how much is common at the end of their emails? If there is a repeating string, that's signature / garbage. Knowing corporate world, emails coming from the same domain might also be used to detect the legalese barf.
    b) If the hidden reply part would be easily accessible ('+' button to make reply part visible, for example), using more aggressive detectors would be feasible as no information would be permanently discarded.

    Yours,
      - Kalle

  12. 12 Posted by Julien on 30 Jul, 2013 07:20 PM

    Julien's Avatar

    Hi Kalle,

    Detecting replies only happens for email, and you always have access to the full content by clicking the "ORIGINAL" button (see screenshot). We can improve the detection of replies with well defined patterns, but I don't want to remove too much. People get confused when something is missing from a reply, and they often email us asking what is wrong. Being too agressive with parsing would definitely confuse more users than it would help.

    Regarding a), while it may be technically feasible, that is a lot of work and a lot of heuristics. The time vs improvement it would bring seems way too high for me compared to other features that are waiting for dev time. I'm unfortunately gonna say no.

    If we are missing replies that should obviously be detected, feel free to keep sending examples. I'll continue to improve the parsing as long as it's not overly agressive.

    Cheers!

  13. Julien closed this discussion on 13 Aug, 2013 11:44 PM.

Discussions are closed to public comments.
If you need help with Tender please start a new discussion.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac