Pandoc Filters in Lua

You might remember that I’m using Pandoc to convert entries of this blog into PDF files. While doing that, I started using Lua filters for Pandoc to drive and customize the conversion process. This article explains some of those filters.

Appending Domains on Relative URLs

I write the Markdown source files for my Hugo sites using relative URLs, instead of hardcoded ones including the domain. This has the advantage of allowing the exporting of this website to any other domain in the future, but of course, it breaks the final PDF generated from those same URLs.

The following Pandoc filter in Lua appends whatever domain you want before the link:

function Link(link)
  if not link.target:match("^http") then
    link.target = "https://your.site.url" .. link.target
  end
  return link
end

Here’s how to use it: the original source.md file contains a link to the Pandoc website, and another to a post in this website.

$ cat source.md

You might remember that I'm using [Pandoc](https://pandoc.org/) to
convert entries of this blog [into PDF
files](/blog/exporting-hugo-to-pdf/). While doing that, I
started using Lua filters for Pandoc to drive and customize the
conversion process. This article explains some of those filters.

After applying the filter, the local link includes the “https://your.site.url” domain:

$ pandoc --from=markdown --to=markdown --lua-filter=normalize-links.lua source.md

You might remember that I'm using [Pandoc](https://pandoc.org/) to
convert entries of this blog [into PDF
files](https://your.site.url/blog/exporting-hugo-to-pdf/). While doing that, I
started using Lua filters for Pandoc to drive and customize the
conversion process. This article explains some of those filters.

Transforming Metadata Blocks

Markdown sources used for Hugo sites include a metadata block at the beginning of the file, with information such as the title, the date, and the author. However, when creating PDF or EPUB files, you might want to concatenate many files at once; the problem is that you can only have one metadata block per file!

To avoid losing the metadata of individual files being merged, you can use this Pandoc filter:

local author, date, title

-- Helper function to add "st", "nd", "rd", or "th" suffix to the day
local function day_suffix(day)
  local d = tonumber(day)
  if d == 1 or d == 21 or d == 31 then
    return "st"
  elseif d == 2 or d == 22 then
    return "nd"
  elseif d == 3 or d == 23 then
    return "rd"
  else
    return "th"
  end
end

-- Function to format date as "November 6th, 2004"
local function format_date(raw_date)
  -- Parse the date to get year, month, and day
  local year, month, day = raw_date:match("^(%d%d%d%d)%-(%d%d)%-(%d%d)")
  if year and month and day then
    -- Convert month number to name
    local month_name = os.date("%B", os.time{year=year, month=month, day=day})
    -- Remove leading zero from the day
    day = tostring(tonumber(day))
    -- Add the correct suffix to the day
    local formatted_day = day .. day_suffix(day)
    return month_name .. " " .. formatted_day .. ", " .. year
  else
    return raw_date  -- return raw if date parsing fails
  end
end

-- Function to handle metadata and remove it from the output
function Meta(meta)
  author = meta.author and pandoc.utils.stringify(meta.author) or "Unknown author"
  date = meta.date and pandoc.utils.stringify(meta.date) or "Unknown date"
  title = meta.title and pandoc.utils.stringify(meta.title) or "Untitled"

  -- Format the date in the desired format
  date = format_date(date)

  -- Remove metadata block by returning an empty table
  return {}
end

-- Function to modify the document
function Pandoc(doc)
  -- Create a level 1 header with the title
  local header = pandoc.Header(1, title)

  -- Create a paragraph with "By {author}, {date}"
  local byline = pandoc.Para({pandoc.Str("By " .. author .. ", " .. date)})

  -- Insert the new elements at the beginning of the document
  table.insert(doc.blocks, 1, byline)
  table.insert(doc.blocks, 1, pandoc.Plain({})) -- empty line
  table.insert(doc.blocks, 1, header)

  -- Return the modified document
  return doc
end

Let’s see this filter in action. Take for example, the source of this article:

$ cat source.md

---
title: "Pandoc Filters in Lua"
date: 2025-03-28
draft: false
tags: ['lua', 'pandoc', 'markdown', 'hugo', 'blogging']
author: "Adrian Kosmaczewski"
---

You might remember that I'm using [Pandoc](https://pandoc.org/) to
convert entries of this blog [into PDF
files](/blog/exporting-hugo-to-pdf/). While doing that, I started using
Lua filters for Pandoc to drive and customize the conversion process.
This article explains some of those filters.

Applying the filter we get the following Markdown, with the title, author, and date information embedded in the text.

$ pandoc --from=markdown --to=markdown --lua-filter=extract-metadata.lua source.md

# Pandoc Filters in Lua

By Adrian Kosmaczewski, March 28th, 2025

You might remember that I'm using [Pandoc](https://pandoc.org/) to
convert entries of this blog [into PDF
files](/blog/exporting-hugo-to-pdf/). While doing that, I started using
Lua filters for Pandoc to drive and customize the conversion process.
This article explains some of those filters.

You can also apply various Lua filters in the same Pandoc invocation, repeating the --lua-filter argument (or its shorthand -L) as often as needed:

$ pandoc -f markdown -t markdown -L extract-metadata.lua \
    -L normalize-links.lua source.md

Setting Image Paths with Environment Variables

Finally, there’s an important issue when generating PDF files from Hugo sources: images. In the source repository for my sites, I store images next to the Markdown file, which means that I need to pass the path to the folder to the command used to generate the PDF. I have chosen to use an environment variable for that:

local path = require "pandoc.path"

function Image(img)
  -- Check if the PANDOC_SOURCE environment variable is set
  local source_dir = os.getenv("PANDOC_SOURCE")
  if source_dir then
    -- Join the source directory with the image source path
    img.src = path.join{source_dir, img.src}
  end
  return img
end

This is a very simple Markdown file that includes an image:

This is an image:

![](image.png)

Using the filter, and setting the PANDOC_SOURCE variable, allows injecting the full path to the image, which helps to generate a fully-featured PDF or EPUB file:

$ PANDOC_SOURCE=/home/user pandoc -f markdown \
    -t markdown -L add-path-to-images.lua source.md

This is an image:

![](/home/user/image.png)

If you would like all links on your Hugo site to open in a separate browser window or tab, you can use the Lua filter below:

function Link(link)
  if string.match(link.target, '^http') then
    link.attributes.target = '_blank'
  end
  return link
end

Removing script Tags

The filter below removes any <script> tag in your HTML sources, so that your final output does not include them. This is particularly useful when creating EPUB files, which are, by definition, compressed archives filled with HTML content.

function RawBlock(el)
  if el.format == "html" and el.text:find("<script") and el.text:find("</script>") then
    return nil
  end
end

Removing Arbitrary Text

You can use Pandoc filters to remove pretty much any text that appears in your sources, giving you unprecedented flexibility.

function Str(elem)
  if elem.text:match("^some text to remove") then
    return pandoc.Str("")
  end
  return elem
end

Removing Images

Sometimes you only want the text in the output, and no references to images whatsoever. The filter below does exactly that:

function Image(img)
  return {}
end

Removing the Title

This short filter removes the first-level header from your source Markdown, leaving headers of all other levels intact:

function Header(el)
  if el.level == 1 then
    return {}
  end
  return el
end