Calculating hash part of MSIX Package Family Name

Package family name is important property, that describes each MSIX package installed on Windows 10 / Windows 11. It consists of package name, concatenated with a special string of 13 letters and numbers. There are many places where this value is used, but the most easy one to spot is that it builds the folder path, under which all MSIX files are saved.

For example, the package family name of MSIX Hero is MSIXHero_2.2.56.0_neutral__zxq1da1qqbeze. You can see the family name by invoking a PowerShell command let Get-AppxPackage <name> and scrolling to PackageFullName

… or by opening the package in MSIX Hero, which also shows the value:

The first part of the family name is just simply a package name. This is something the author defines in the manifest file. More interesting is the second part – it seems to be not written directly into the manifest, for the very package is always the same on all machines and seems to change when the publisher changes. For this reasons, it is sometimes being called a “publisher hash”. It also has an interesting properties:

  • It is always a string of exactly 13 letters and/or numbers
  • It avoids certain letters (for example “i” and “o” – you will never find them in the family name).
  • Changes together with the full publisher name. Changing the display values and other package identification does not have impact on it.

This post will explain how the value can be calculated and how to implement the algorithm in C#. Since the hash algorithm is constant on all machines, you do not have to install the package to know its family name / publisher hash.

TLDR

C# algorithm to calculate publisher hash from the name can be found in my GitHub gist. In this blog post there is also a PowerShell code which does the same.

The algorithm

Here is the algorithm to calculate, with sample values.

  1. Take UTF-16 string containing the publisher name (as-is, with all spaces and punctuations)
  2. Calculate SHA-256 hash of byte representation of this string
  3. Take first 8 bytes (64 bits)
  4. Pad the binary value by a single zero bit to the right (= left shift all bits)
  5. Group the bits in groups of 5 (since we had 64 + 1 bits, we should get 13 groups each having 5 bytes)
  6. For each group, convert the bit representation to an integer, and perform a look-up in a replacement table mapping the numbers to letters and digits.
  7. Join the letters together and make them lowercase to receive the publisher hash.

As a matter of fact, steps 5-6 represent an algorithm to convert bytes into a textual 32-symbol notation, known as Douglas Crockford Base32 which uses a special replacement table that:

  1. Has exactly 32 input values and matching output values.
  2. Uses a subset of English alphabet and digits, and avoids letters that may be confusing when evaluated by humans (for example, it does not encode to “l” or “i”, because they look the same when uppercased).

Example

Let’s try to repeat the steps using some real life example.

The publisher name of MSIX Hero is:

E=marcin@otorowski.com, CN=Marcin Otorowski, O=Marcin Otorowski,
S=zachodniopomorskie, C=PL

First, let’s convert it to UTF-16 string. It is important, that we consider all characters, casing and punctuation as-is. The string is represented by the following byte-sequence:

0x45, 0x00, 0x3D, 0x00, 0x6D, 0x00, 0x61, 0x00, 0x72, 0x00, 0x63, 0x00, 0x69, 0x00, 0x6E, 0x00, 0x40, 0x00, 0x6F, 0x00, 0x74, 0x00, 0x6F, 0x00, 0x72, 0x00, 0x6F, 0x00, 0x77, 0x00, 0x73, 0x00, 0x6B, 0x00, 0x69, 0x00, 0x2E, 0x00, 0x63, 0x00, 0x6F, 0x00, 0x6D, 0x00, 0x2C, 0x00, 0x20, 0x00, 0x43, 0x00, 0x4E, 0x00, 0x3D, 0x00, 0x4D, 0x00, 0x61, 0x00, 0x72, 0x00, 0x63, 0x00, 0x69, 0x00, 0x6E, 0x00, 0x20, 0x00, 0x4F, 0x00, 0x74, 0x00, 0x6F, 0x00, 0x72, 0x00, 0x6F, 0x00, 0x77, 0x00, 0x73, 0x00, 0x6B, 0x00, 0x69, 0x00, 0x2C, 0x00, 0x20, 0x00, 0x4F, 0x00, 0x3D, 0x00, 0x4D, 0x00, 0x61, 0x00, 0x72, 0x00, 0x63, 0x00, 0x69, 0x00, 0x6E, 0x00, 0x20, 0x00, 0x4F, 0x00, 0x74, 0x00, 0x6F, 0x00, 0x72, 0x00, 0x6F, 0x00, 0x77, 0x00, 0x73, 0x00, 0x6B, 0x00, 0x69, 0x00, 0x2C, 0x00, 0x20, 0x00, 0x53, 0x00, 0x3D, 0x00, 0x7A, 0x00, 0x61, 0x00, 0x63, 0x00, 0x68, 0x00, 0x6F, 0x00, 0x64, 0x00, 0x6E, 0x00, 0x69, 0x00, 0x6F, 0x00, 0x70, 0x00, 0x6F, 0x00, 0x6D, 0x00, 0x6F, 0x00, 0x72, 0x00, 0x73, 0x00, 0x6B, 0x00, 0x69, 0x00, 0x65, 0x00, 0x2C, 0x00, 0x20, 0x00, 0x43, 0x00, 0x3D, 0x00, 0x50, 0x00, 0x4C, 0x00

Now, we calculate SHA-256 hash out of it:

0xFF, 0x6E, 0x16, 0xA8, 0x37, 0xBA, 0xDD, 0xF7, 0x77, 0xE2, 0x58, 0x8A, 0x3B, 0x5A, 0x53, 0xEB, 0x1F, 0xA5, 0x24, 0x2A, 0xE1, 0x5E, 0xF6, 0x52, 0x2E, 0x93, 0x75, 0x47, 0xA2, 0xF7, 0x5E, 0x0B

SHA-256 produces a 256-bit (32-byte) sequence, no matter what the input string is. First 8-bytes in the hash are marked bold, because only they take part in further calculation. We convert them to a binary representation (a string containing only “0” and “1”):

11111111, 01101110, 00010110, 10101000, 00110111, 10111010, 11011101, 11110111

The value has 64-bits in total, but according to the description we need 65 (a value dividable by 5). Let’s pad an extra zero to the right (or in other words left shift all bytes). Then, let’s group them in 13 groups, each having 5 bits:

11111 11101 10111 00001 01101 01010 00001 10111 10111 01011 01110 11111 01110

Now let’s change them into integers (note: with 5 bits, each group may represent an integer between 0 and 31). For example, the first group is 11111 (BIN), which is a binary representation of number 31 (DEC). The second group is 11101 (BIN), which is 29 (DEC). When applied to all other groups, this gives the following:

31, 29, 23, 1, 13, 10, 1, 23, 23, 11, 14, 31, 14

The last part is to translate the numbers using Base32 encoding table (taken from crockford.com/base32.html)

Symbol
Value
Encoded
Symbol
Symbol
Value
Encoded
Symbol
0016G
1117H
2218J
3319K
4420M
5521N
6622P
7723Q
8824R
9925S
10A26T
11B27V
12C28W
13D29X
14E30Y
15F31Z

For example, the first number on our list is 31, followed by 29. We make a look-up in the first column and take corresponding value from the second column (encoded symbol). So the first two numbers will represent Z followed by X. Continuing for all other digits, we get the following letters:

Z, X, Q, 1, D, A, 1, Q, Q, B, E, Z, E

Now join them together and convert to the lowercase string. This gives us the final value:

zxq1da1qqbeze

When combined with the package name (MSIXHero) with _ (underscore) as a separator, it represents the package family:

Implementation (C#)

A simple, tiny and dependency-free implementation in C#:

// using System;
// using System.Linq;
// using System.Security.Cryptography;
// using System.Text;

public static string GetPublisherHash(string publisherId)
{
    using var sha = HashAlgorithm.Create("SHA256");
    var encoded = sha.ComputeHash(Encoding.Unicode.GetBytes(publisherId));
    var binaryString = string.Concat(encoded.Take(8).Select(c => Convert.ToString(c, 2).PadLeft(8, '0'))) + '0'; // representing 65-bits = 13 * 5
    var encodedPublisherId = string.Concat(Enumerable.Range(0, binaryString.Length / 5).Select(i => "0123456789ABCDEFGHJKMNPQRSTVWXYZ".Substring(Convert.ToInt32(binaryString.Substring(i * 5, 5), 2), 1)));
    return encodedPublisherId.ToLower();
}

Not willing to reinvent the wheel? Base32 encoding is widely recognized in the industry, and there are many libraries out-there which do the work – see for example https://github.com/ssg/SimpleBase. Code above is to just demonstrate the concept.

Implementation (PowerShell)

function Get-PublisherHash($publisherName)
{
    $publisherNameAsUnicode = [System.Text.Encoding]::Unicode.GetBytes($publisherName);
    $publisherSha256 = [System.Security.Cryptography.HashAlgorithm]::Create("SHA256").ComputeHash($publisherNameAsUnicode);
    $publisherSha256First8Bytes = $publisherSha256 | Select-Object -First 8;
    $publisherSha256AsBinary = $publisherSha256First8Bytes | ForEach-Object { [System.Convert]::ToString($_, 2).PadLeft(8, '0') };
    $asBinaryStringWithPadding = [System.String]::Concat($publisherSha256AsBinary).PadRight(65, '0');

    $encodingTable = "0123456789ABCDEFGHJKMNPQRSTVWXYZ";

    $result = "";
    for ($i = 0; $i -lt $asBinaryStringWithPadding.Length; $i += 5)
    {
        $asIndex = [System.Convert]::ToInt32($asBinaryStringWithPadding.Substring($i, 5), 2);
        $result += $encodingTable[$asIndex];
    }

    return $result.ToLower();
}

As already mentioned in the C# section, you do not have to write base32 encoding on your own. A lot of excellent modules are available, for example https://www.powershellgallery.com/packages/BaseEncoder/1.0.0.0/Content/BaseEncoder.psm1.

Points of interest

  • Only eight first bytes from SHA-256 signature take place in further calculations. This makes it potentially possible that two products with a name that is generic enough (MyAwesomeGame or Editor) produce the same family for two different publishers. This collision is not really likely, but still possible.
  • Package family name is invalidated if the publisher name changes. The change could be even a different casing or spacing, not to mention the situation in which the publisher name changes due to the certificate having different subject. This on the other hands means, that if the company moves to another address (and changes its certificate subject) they break the update chain, because the new packages will have a different family name.

Leave a Reply