You might remember that I’m using Pandoc to convert entries of this blog into PDF files. While doing that, I started using Lua filters for Pandoc to drive and customize the conversion process. This article explains some of those filters.
Appending Domains on Relative URLs
I write the Markdown source files for my Hugo sites using relative URLs, instead of hardcoded ones including the domain. This has the advantage of allowing the exporting of this website to any other domain in the future, but of course, it breaks the final PDF generated from those same URLs.
The following Pandoc filter in Lua appends whatever domain you want before the link:
function Link(link)
if not link.target:match("^http") then
link.target = "https://your.site.url" .. link.target
end
return link
end
Here’s how to use it: the original source.md
file contains a link to the Pandoc website, and another to a post in this website.
$ cat source.md
You might remember that I'm using [Pandoc](https://pandoc.org/) to
convert entries of this blog [into PDF
files](/blog/exporting-hugo-to-pdf/). While doing that, I
started using Lua filters for Pandoc to drive and customize the
conversion process. This article explains some of those filters.
After applying the filter, the local link includes the “https://your.site.url” domain:
$ pandoc --from=markdown --to=markdown --lua-filter=normalize-links.lua source.md
You might remember that I'm using [Pandoc](https://pandoc.org/) to
convert entries of this blog [into PDF
files](https://your.site.url/blog/exporting-hugo-to-pdf/). While doing that, I
started using Lua filters for Pandoc to drive and customize the
conversion process. This article explains some of those filters.
Transforming Metadata Blocks
Markdown sources used for Hugo sites include a metadata block at the beginning of the file, with information such as the title, the date, and the author. However, when creating PDF or EPUB files, you might want to concatenate many files at once; the problem is that you can only have one metadata block per file!
To avoid losing the metadata of individual files being merged, you can use this Pandoc filter:
local author, date, title
-- Helper function to add "st", "nd", "rd", or "th" suffix to the day
local function day_suffix(day)
local d = tonumber(day)
if d == 1 or d == 21 or d == 31 then
return "st"
elseif d == 2 or d == 22 then
return "nd"
elseif d == 3 or d == 23 then
return "rd"
else
return "th"
end
end
-- Function to format date as "November 6th, 2004"
local function format_date(raw_date)
-- Parse the date to get year, month, and day
local year, month, day = raw_date:match("^(%d%d%d%d)%-(%d%d)%-(%d%d)")
if year and month and day then
-- Convert month number to name
local month_name = os.date("%B", os.time{year=year, month=month, day=day})
-- Remove leading zero from the day
day = tostring(tonumber(day))
-- Add the correct suffix to the day
local formatted_day = day .. day_suffix(day)
return month_name .. " " .. formatted_day .. ", " .. year
else
return raw_date -- return raw if date parsing fails
end
end
-- Function to handle metadata and remove it from the output
function Meta(meta)
author = meta.author and pandoc.utils.stringify(meta.author) or "Unknown author"
date = meta.date and pandoc.utils.stringify(meta.date) or "Unknown date"
title = meta.title and pandoc.utils.stringify(meta.title) or "Untitled"
-- Format the date in the desired format
date = format_date(date)
-- Remove metadata block by returning an empty table
return {}
end
-- Function to modify the document
function Pandoc(doc)
-- Create a level 1 header with the title
local header = pandoc.Header(1, title)
-- Create a paragraph with "By {author}, {date}"
local byline = pandoc.Para({pandoc.Str("By " .. author .. ", " .. date)})
-- Insert the new elements at the beginning of the document
table.insert(doc.blocks, 1, byline)
table.insert(doc.blocks, 1, pandoc.Plain({})) -- empty line
table.insert(doc.blocks, 1, header)
-- Return the modified document
return doc
end
Let’s see this filter in action. Take for example, the source of this article:
$ cat source.md
---
title: "Pandoc Filters in Lua"
date: 2025-03-28
draft: false
tags: ['lua', 'pandoc', 'markdown', 'hugo', 'blogging']
author: "Adrian Kosmaczewski"
---
You might remember that I'm using [Pandoc](https://pandoc.org/) to
convert entries of this blog [into PDF
files](/blog/exporting-hugo-to-pdf/). While doing that, I started using
Lua filters for Pandoc to drive and customize the conversion process.
This article explains some of those filters.
Applying the filter we get the following Markdown, with the title, author, and date information embedded in the text.
$ pandoc --from=markdown --to=markdown --lua-filter=extract-metadata.lua source.md
# Pandoc Filters in Lua
By Adrian Kosmaczewski, March 28th, 2025
You might remember that I'm using [Pandoc](https://pandoc.org/) to
convert entries of this blog [into PDF
files](/blog/exporting-hugo-to-pdf/). While doing that, I started using
Lua filters for Pandoc to drive and customize the conversion process.
This article explains some of those filters.
You can also apply various Lua filters in the same Pandoc invocation, repeating the --lua-filter
argument (or its shorthand -L
) as often as needed:
$ pandoc -f markdown -t markdown -L extract-metadata.lua \
-L normalize-links.lua source.md
Setting Image Paths with Environment Variables
Finally, there’s an important issue when generating PDF files from Hugo sources: images. In the source repository for my sites, I store images next to the Markdown file, which means that I need to pass the path to the folder to the command used to generate the PDF. I have chosen to use an environment variable for that:
local path = require "pandoc.path"
function Image(img)
-- Check if the PANDOC_SOURCE environment variable is set
local source_dir = os.getenv("PANDOC_SOURCE")
if source_dir then
-- Join the source directory with the image source path
img.src = path.join{source_dir, img.src}
end
return img
end
This is a very simple Markdown file that includes an image:
This is an image:

Using the filter, and setting the PANDOC_SOURCE
variable, allows injecting the full path to the image, which helps to generate a fully-featured PDF or EPUB file:
$ PANDOC_SOURCE=/home/user pandoc -f markdown \
-t markdown -L add-path-to-images.lua source.md
This is an image:

Adding “target=_blank” to Links
If you would like all links on your Hugo site to open in a separate browser window or tab, you can use the Lua filter below:
function Link(link)
if string.match(link.target, '^http') then
link.attributes.target = '_blank'
end
return link
end
Removing script Tags
The filter below removes any <script>
tag in your HTML sources, so that your final output does not include them. This is particularly useful when creating EPUB files, which are, by definition, compressed archives filled with HTML content.
function RawBlock(el)
if el.format == "html" and el.text:find("<script") and el.text:find("</script>") then
return nil
end
end
Removing Arbitrary Text
You can use Pandoc filters to remove pretty much any text that appears in your sources, giving you unprecedented flexibility.
function Str(elem)
if elem.text:match("^some text to remove") then
return pandoc.Str("")
end
return elem
end
Removing Images
Sometimes you only want the text in the output, and no references to images whatsoever. The filter below does exactly that:
function Image(img)
return {}
end
Removing the Title
This short filter removes the first-level header from your source Markdown, leaving headers of all other levels intact:
function Header(el)
if el.level == 1 then
return {}
end
return el
end