Trying to make use of Outlook’s Thread-Index: header

2007/12/11

Recently I was in a situation where I had to reconstruct a thread of email messages using the Thread-Index: header which is used by Microsoft’s products, instead of the standard way of threading using Message-Id:, References: and In-Reply-To:

The truth is that I was really frustrated, thinking that Microsoft was breaking the standards using custom headers that do not begin with X- but as Dan Bernstein points out:

822 promised that the IETF would never define field names beginning with X-. It did not prohibit use of non-X names by other organizations.”

Which means that Microsoft is allowed to add Thread-Index: (and Thread-Topic:) without breaking any standards. On the other hand Microsoft does not document anywhere (at least anywhere I looked and I looked plenty) how Thread-Index: is calculated and how it can be decoded to be made useful by any other application, any other than Outlook that is.

After some experimenting and a little bit of reverse engineering I’ve reached to the following results:

  • Thread-Topic: preserves the original subject of the thread, that is the Subject: but stripped from any Fw: or Re: prefixes.
  • Thread-Index: is used in a way similar to In-Reply-To: and References: Assuming that the first message in a thread has a:
    Thread-Index: AcdyY+a08VX8xfobTsy61v9NHPZ7QA==

    and the next in thread:

    Thread-Index: AcdyY+a08VX8xfobTsy61v9NHPZ7QAAAiXbA

    while a third one:

    Thread-Index: AcdyY+a08VX8xfobTsy61v9NHPZ7QAAAiXbAAAXP5fw=

    and a fourth one:

    Thread-Index: AcdyY+a08VX8xfobTsy61v9NHPZ7QAAAiXbAAAXP5fwAABGXGw==

    the pattern that decides the threading seems obvious; I have not yet found out what the single or double equal sign suffix means.

If only Microsoft could make such simple information available! Think of all the lost work hours! Only after I had resolved my problem did I find out about these guys, who had arrived on similar conclusions about the usage of Thread-Index:

Update #1: You may be interested to read the next episode.

Update #2: Yes, I keep refusing the BASE64 explanation. This is because what the BASE64 value decodes to is something either meaningless, or without known semantics.

Update #3: From the GNOME documentation: The value is apparently unique but has no meaning we know of. That is why I refuse the BASE64 explanation. It looks like a BASE64 string and it can get decoded into a string of bytes that one can represent as a number. But the questions remain unanswered: How is the first 27-byte long value chosen? Why every “next” value in a thread 5 bytes longer than the previous one? How are these 5 bytes chosen? The decoded value of an undocumented BASE64 string remains undocumented, hence it may not even be a BASE64 string at all (and may only coincidentally look like one).


The example Thread-Index: headers are taken from the MediaDefender Defenders site

9 Responses to “Trying to make use of Outlook’s Thread-Index: header”

  1. Apostolos Says:

    They seem to me like base64 encoded strings (because of the equal signs at the end).

  2. adamo Says:

    @Apostolos:
    Unfortunately, they are not :(

  3. Tack Says:

    @adamo:

    Apostolos is correct. They are base64 encoded.

  4. Arboleda Says:

    @adamo:

    Apostolos 8s correct. They are base64 encoded.;

  5. adamo Says:

    To the next person that will insist that they are base64 encoded:

    1- Please try and decode the string. Then come up with a meaningful explanation of the result.

    2- Still not convinced? Read this paper [pdf].

    3- Still not convinced? Read my next post on the subject.

  6. Ryan Mauger Says:

    The above people ARE (sort of) correct.
    base64 is NOT for encoding strings. it is for encoding BINARY data in an ascii string, to make it safe for ascii mode data transfer (as is required by SMTP)
    The result still looks meaningless when decoded, because your trying to read it as a string, when it is infact, binary data.

    • adamo Says:

      It is OK for you to believe that I did not try to see it as binary data because you do not know me. The above people (including you) are NOT correct, or are leaving guesswork to the reader. One can give any x as input to an f(x) and get some output. This is the case here with base64. However, as long as someone is not telling what the resulting binary data stands for, it still is undocumented garbage.

      So please provide an explanation of the binary data we are all looking at in order to provide a correct (and complete) answer. What is it that you are seeing there? Is it a number? A random byte sequence? Something else? But do not tell me that it is binary data without telling me what it stands for. Everything can be viewed as binary data if it suits the purpose. So the question remains.

  7. Mrten Says:

    it’s an OLE timestamp (22 bytes), appended with timediffs (5 bytes). which sucks, because the timestamp is not guaranteed unique.


Υποβολή σχολίου

Fill in your details below or click an icon to log in:

Gravatar
WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Αλλαγή )

Twitter picture

You are commenting using your Twitter account. Log Out / Αλλαγή )

Facebook photo

You are commenting using your Facebook account. Log Out / Αλλαγή )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 994 other followers