# Font Subsetting and Ligatures

In my sightread.org project, I'm using [MusGlyphs](https://www.notationcentral.com/product/musglyphs/) which is a clever font that uses _ligatures_ to render music notation. Instead of hunting for Unicode symbols, you type intuitive letter sequences and the font transforms them into musical symbols:

* `q`: ♩ (quarter note)
* `ee`: ♫ (two beamed eighths)
* `ssss`: four beamed sixteenths
* `4/4`: time signature

This makes it trivial to display rhythm patterns in HTML, just apply the font family and type `eee` for triplets or `q.` for a dotted quarter.

The full font is 93KB after I added a few ligatures not supported by the original font and after I converted it to `woff2`. 

Here's a giant screenshot from fontdrop.info rendering all the ligatures for a total of 973 glyphs: [fontdrop.info1.png](/i/posts/liga/fontdrop.info1.png)

For [SightRead.org](https://sightread.org), I only use ~30 of the 200+ ligatures. Below is the riveting story of how I got the font file size down to 8.5KB.

## The Challenge

When subsetting a font that uses ligatures, you need both the **input characters** (what you type) and the **output glyphs** (what gets rendered). Here are two strategies.

A _ligature_ is when two or more characters are combined into a single glyph (symbol). The classic example is "fi"—in many fonts, typing `f` followed by `i` renders as a joined `ﬁ` where the dot of the `i` merges with the hook of the `f`. MusGlyphs takes this concept further: typing `ee` triggers a ligature that renders as two beamed eighth notes.

For the subsetting I'm using `pyftsubset`, a Python tool used behind the scenes in friendlier tools such as Glyphhanger. 

## The Simple Approach: `--text`

The easiest way is to use `--text` with all the characters I need:

<pre class="prettyprint">
pyftsubset font.woff2 \
  --text="/().12345689behqsw" \
  --output-file=font-subset.woff2 \
  --flavor=woff2
</pre>

This tells pyftsubset: "I'll be typing these characters, figure out what glyphs I need."

Pros:

* Simple, one command
* Automatically includes ligature outputs when you type matching sequences

Cons:

* Includes ALL ligatures that _could_ match, not just the ones you use
* May include extra glyphs you don't need
* Larger file size

In my case: **24KB** (from 93KB original) and 277 glyphs. Here's another screenshot of the result shown in fontdrop.info: [fontdrop.info2.png](/i/posts/liga/fontdrop.info2.png)

## The Optimal Approach: Explicit Glyph Names

For minimal size, I need to explicitly list only the glyphs I need:

<pre class="prettyprint">
pyftsubset font.woff2 \
  --glyphs=".notdef,e,s,e_e.liga,s_s_s_s.liga,..." \
  --layout-features="liga" \
  --no-layout-closure \
  --output-file=font-subset.woff2 \
  --flavor=woff2
</pre>

The key flags:
* `--glyphs` — exact glyph names (not characters)
* `--layout-features="liga"` — keep only ligature substitution rules
* `--no-layout-closure` — don't auto-add related glyphs, only use the ones I listed

The challenge: I need the internal glyph names, not the input characters. For a ligature like `ee` (two eighth notes), the glyph name might be `e_e.liga`.

### Finding Glyph Names

The secret is using another Python too, `fontTools`, to inspect the font's ligature table and find out the internal glyph names:

<pre class="prettyprint">
from fontTools.ttLib import TTFont

f = TTFont('font.woff2')
cmap = f.getBestCmap()
reverse_cmap = {v: chr(k) for k, v in cmap.items()}

for feature in f['GSUB'].table.FeatureList.FeatureRecord:
    if feature.FeatureTag == 'liga':
        for lookup_idx in feature.Feature.LookupListIndex:
            lookup = f['GSUB'].table.LookupList.Lookup[lookup_idx]
            for subtable in lookup.SubTable:
                if hasattr(subtable, 'ligatures'):
                    for first, ligs in subtable.ligatures.items():
                        for lig in ligs:
                            components = [first] + lig.Component
                            chars = ''.join(reverse_cmap.get(g, '?') for g in components)
                            print(f"{chars} → {lig.LigGlyph}")
</pre>

Output:
<pre class="prettyprint">
ee → e_e.liga
ssss → s_s_s_s.liga
4/4 → four_slash_four.liga
...
</pre>

Thanks to this script I end up with a scary looking subsetting command:

<pre class="prettyprint">
pyftsubset fonts/MusGlyphs-ss.woff2 \
  --glyphs=".notdef,b,e,e_e.liga,e_e_e.liga,e_e_e_three.liga,e_e_s_s.liga,e_period_s.liga,e_s_s.liga,e_s_s_e.liga,e_s_s_s_s.liga,eight,five,five_e.liga,five_q.liga,five_s.liga,four,four_slash_four.liga,h,h_period.liga,nine,nine_slash_eight.liga,one,one_two_slash_eight.liga,parenleft,parenleft_eight_six_parenright_b.liga,parenleft_six_eight_parenright_b.liga,parenright,period,q,q_period.liga,s,s_e_period.liga,s_e_s.liga,s_s_e.liga,s_s_e_e.liga,s_s_e_s_s.liga,s_s_s_s.liga,s_s_s_s_s_s.liga,six,six_slash_eight.liga,slash,three,three_q_q_q.liga,three_slash_four.liga,two,two_slash_four.liga,w" \
  --layout-features="liga" \
  --no-layout-closure \
  --output-file=fonts/MusGlyphs-subset.woff2 \
  --flavor=woff2
</pre>

### The Result

With explicit glyph names the resulting file size is **8.5KB** (vs 24KB simple, 93KB original). That's a **91% reduction** . Worth the extra effort for a production font? Maybe. I'd say yes.

Here's the final screenshot of the result shown in fontdrop.info and its 48 glyphs:

![48 glyphs in the final subset](/i/posts/liga/fontdrop.info3.png)

Note that this still has more glyphs than I need. E.g. I don't use the flat symbol (when you type "b") or the number 5 but the `b` and `5` are needed by other ligatures.

## Which to Choose?

* `--text` — Larger size, minimal effort. Good for prototyping or when size doesn't matter.
* `--glyphs` — Minimal size, requires a script. Best for production and performance-critical use.

For fonts with many ligatures, the optimal approach can have noticeable savings. For simple fonts, `--text` is most likely good enough.

