This content originally appeared on Level Up Coding – Medium and was authored by Itsuki
+ All the problems I ran into while configuring the DockerFile for Ubuntu/Debian! Could not connect to deb, Tofu-ed Characters, and more!

Like Always, everything available on GitHub!
Disclaimer!
Okay, I said for free, but obviously we need to pay for running lambda, or wherever you are hosting the function on!
First of all, there are a bunch of paid services, either Web API or SDKS, available!
But! But! But! But!
Obviously, that is not really interesting! (Yes, interesting or not is how I judge whether if I will do something or not!)
We add the SDK, and call it! End!
So!
Let’s try another (hopefully a little more interesting) approach here in this article!
Where we will be using a combination of LibreOffice, Docker, Express, and serving our little API on Lambda!
Basic Idea
Quick Introduction to LibreOffice
First of all, LibreOffice is a private, free and open source office suite compatible with Microsoft Office/365 files such as .doc, .docx, .xls, .xlsx, .ppt, .pptx.
We can manipulate(edit, export, and etc.) office documents using the GUI provided but it also comes with a command line functionality for converting any office document (Word, Excel, PowerPoint, and etc.) into PDF.
On Linux, it is (or can be) as simple as following.
/usr/bin/libreoffice --headless --convert-to pdf source-file.excel
This will automatically choose a filter for conversion, for example, calc_pdf_Export for Excel, and output the converted PDF file with the following print out to stdout.
convert /tmp/Print_180_45_4.xlsx as a Calc document -> /app/Print_180_45_4.pdf using filter: calc_pdf_Export
If we know what kind of file we are expecting, we can also explicitly specify the filter together with some other configurations.
/usr/bin/libreoffice --headless --convert-to 'pdf:calc_pdf_Export:{"SinglePageSheets":{"type":"boolean","value":"true"},"PaperSize":{"type":"string","value":"A4"}}' "${sourceFilePath}"
Here is the full reference if you are interested!
General Approach
Obviously we don’t want an entire server or a whole EC2 instance just for running the command above!
(Maybe you do, but I don’t!)
Here is where Docker shines! (I meant container! Sorry Apple Container, I am making a GUI for you, but within the deep myself, Docker and container are still equivalent!)
So! Here is what we will be doing!
Within our function code, we
- Receive the file bytes
- Temporarily save it with fs
- Convert it to PDF with the command above using child_process
- Read the PDF converted and return the bytes!
And for our Docker image, we will have A Ubuntu/Debian Image with A BUNCH of set ups! That we will be diving into in couple seconds!
Set Ups
Just to make sure we are on the same page!
Because we will be refer to some of those within the Dockerfile!
Folder Structure
.
├── Dockerfile
├── build
├── node_modules
├── package-lock.json
├── package.json
├── src
└── tsconfig.json
package.json
{
"name": "office-to-pdf",
"version": "1.0.0",
"main": "index.js",
"license": "MIT",
"type": "commonjs",
"scripts": {
"build": "tsc && chmod 755 build/*",
"dev": "npm run build && node build/index.js"
},
"dependencies": {
"ejs": "^3.1.10",
"express": "^5.1.0"
},
"devDependencies": {
"@types/express": "^5.0.1",
"@types/node": "^22.14.0",
"typescript": "^5.8.3"
}
}
The only thing here is to make sure to have a build script! We will be calling it from the Dockerfile!
tsconfig.json
{
"compilerOptions": {
"target": "esnext",
"module": "nodenext",
"moduleResolution": "nodenext",
"outDir": "./build",
"rootDir": "./src",
"strict": true,
"esModuleInterop": true,
"skipLibCheck": true,
"forceConsistentCasingInFileNames": true,
"resolveJsonModule": true,
},
"include": [
"src/**/*"
],
"exclude": [
"node_modules"
]
}
Please make sure that outDir is set to build!
Lambda/Express Code
As you might already realize it from the package.json above, butI will be using Express here instead of simply defining some handlers so that we can add more endpoints if we want, and we can easily host it anywhere outside of Lambda!
This is the simple part! So! Full Code first, then couple points to be a little careful about!
// src/index.js
import express, { Request, Response } from 'express'
import * as fsAsync from 'fs/promises'
import * as fs from 'fs'
import { join, parse as parsePath } from "path"
import { exec } from 'child_process'
import { promisify } from 'util'
const execAsync = promisify(exec)
const LIBRE_OFFICE_BIN = "/usr/bin/libreoffice"
// write to the /tmp directory because Lambda is read only except for /tmp
const TEMP_FOLDER_PAHT = "/tmp"
const MIME_PDF = "application/pdf"
const PORT = 8080
const app = express()
app.use(express.raw({ type: '*/*' }))
/***********************/
/******* Routes ********/
/***********************/
// For Web adaptor Readiness Check: https://github.com/awslabs/aws-lambda-web-adapter?tab=readme-ov-file#readiness-check
// By default, Lambda Web Adapter will send HTTP GET requests to the web application at http://127.0.0.1:8080/.
// The port and path can be customized with two environment variables: AWS_LWA_READINESS_CHECK_PORT and AWS_LWA_READINESS_CHECK_PATH.
// Lambda Web Adapter will retry this request every 10 milliseconds until the web application returns an HTTP response (status code >= 100 and < 500) or the function times out.
// After passing readiness check, Lambda Web Adapter will start Lambda Runtime and forward the invokes to the web application.
app.get("/health-check", async (req: Request, res: Response) => {
res.sendStatus(200)
return
})
app.post('/', async (req: Request, res: Response) => {
const bytes: Uint8Array | undefined = req.body as Uint8Array | undefined
if (bytes === undefined) {
res.status(500).send({
error: true,
message: "Invalid request body."
})
return
}
const header = req.headers['content-disposition']
if (header === undefined) {
res.status(500).send({
error: true,
message: "`content-disposition` header required."
})
return
}
const filename = extractFileName(header)
if (filename === undefined) {
res.status(500).send({
error: true,
message: "Invalid `content-disposition` header ."
})
return
}
if (!checkFileFormat(filename)) {
res.status(500).send({
error: true,
message: "Invalid file format."
})
return
}
try {
await writeFile(filename, bytes)
console.log(`Data successfully written to ${filename}`)
} catch (error) {
res.status(500).send({
error: true,
message: `Error writing data to file: ${error}`
})
return
}
try {
const convertedPDFBuffer = await convertToPDF(filename)
console.log(`convertedPDFBuffer ${convertedPDFBuffer.length}`)
await cleanupFiles(filename)
res.set({
'Content-Type': MIME_PDF,
'Content-disposition': `attachment;filename=${pdfFileName(filename)}`,
'Content-Length': convertedPDFBuffer.length,
})
res.status(200).send(convertedPDFBuffer)
return
} catch (error) {
await cleanupFiles(filename)
res.status(500).send({
error: true,
message: `Error converting to PDF: ${error}`
})
return
}
})
/*********************************/
/******* Helper Functions ********/
/*********************************/
function extractFileName(contentDepositionString: string): string | undefined {
const regex = /((.|\s\S|\r|\n)*)filename\*=(utf-8|UTF-8)''(?<name>((.|\s\S|\r|\n)*))/
const match = contentDepositionString.match(regex)
if (match && match.groups) {
const { name } = match.groups
return name
} else {
return undefined
}
}
function checkFileFormat(filename: string): boolean {
const allowedExtensions: string[] = [".odt", ".ods", ".odp", ".odg", ".doc", ".docx", ".xls", ".xlsx", ".xlt", ".ppt", ".pptx", ".pps", ".pub", ".wps", ".rtf", ".sxw", ".sxc", ".sxi", ".sxp", ".wk1", ".wks", ".123"]
return allowedExtensions.includes(fileExtension(filename))
}
function makeFilePath(filename: string) {
const filePath = join(TEMP_FOLDER_PAHT, filename)
return filePath
}
function fileExtension(filename: string): string {
return parsePath(filename).ext
}
function fileNameWithoutExtension(filename: string): string {
return parsePath(filename).name
}
function pdfFileName(filename: string): string {
return `${fileNameWithoutExtension(filename)}.pdf`
}
async function writeFile(filename: string, data: Uint8Array): Promise<void> {
const filePath = makeFilePath(filename)
await fsAsync.mkdir(TEMP_FOLDER_PAHT, { recursive: true })
await fsAsync.writeFile(filePath, data)
}
async function convertToPDF(filename: string): Promise<Buffer> {
const sourceFilePath = makeFilePath(filename)
const destinationFilePath = makeFilePath(pdfFileName(filename))
// Important point:
// We need to sepcify output directory to /tmp.
// Otherwise, will be written to /app (or whatever specified by the Dockerfile).
// ie: command finished: convert /tmp/Print_180_45_4.xlsx as a Calc document -> /app/Print_180_45_4.pdf using filter: calc_pdf_Export
const command = `"${LIBRE_OFFICE_BIN}" --headless --convert-to pdf "${sourceFilePath}" --outdir "${TEMP_FOLDER_PAHT}"`
console.log("Executing command to convert: ", command)
const { stdout, stderr } = await execAsync(command)
if (stderr) {
throw new Error(`Error executing command: ${stderr}`)
}
console.log("command finished: ", stdout)
if (!fs.existsSync(destinationFilePath)) {
throw new Error("Error converting file to PDF.")
}
const fileBuffer: Buffer = await fsAsync.readFile(destinationFilePath)
return fileBuffer
}
async function cleanupFiles(filename: string) {
const sourceFilePath = makeFilePath(filename)
const destinationFilePath = makeFilePath(pdfFileName(filename))
try {
await fsAsync.unlink(sourceFilePath)
await fsAsync.unlink(destinationFilePath)
console.log(`Remove files successfully`)
} catch (error) {
console.log(`Error removing files: ${error}`)
}
}
/******************************/
/******* Start the App ********/
/******************************/
app.listen(PORT, () => {
console.log(`Server listening at http://localhost:${PORT}`)
})
Couple Points
- TEMP_FOLDER_PAHT: We probably all know by now, but Lambda is on a read-only file system except for this /tmp folder!
- Set –outdir for libreoffice command to make sure that the PDF is also outputted to the /tmp folder.
- health-check endpoint here is to be used for lambda-web-adapter to perform Readiness Check
- I am requiring the Content-Disposition header with the following format Content-Disposition: attachment; filename*=UTF-8''demo.xlsx so that I can get the file name.
- I am accepting bytes directly instead of using multipart/form-data because! API Gateway base64 encode bytes within those forms regardless of the binaryMediaTypes set!
That’s it!



Dockerfile 



If I spent 30 minutes on the code above, I probably have spent 100 * 30 minutes on the Dockerfile!
- Ubuntu/Debian is not the easiest base image to work with!
- Asian character show up as tofu on Linux!
- Lambda’s Read-only file system!
- Some part of the docker code work on one machine but not on others… (Both are Mac…)
(5. I am stupid….)
Final version first! We will then dive into it bit by bit!
FROM debian:stable
# If encountered「Could not connect to deb.debian.org:80 (151.101.110.132)」error,
# we can try to set up the proxy like following.
# It might be needed on some machines.
# ENV http_proxy="http://ftp.jp.debian.org"
# ENV https_proxy="http://ftp.jp.debian.org"
#######################
# Font + Local Set Up #
#######################
# Required to display japanese.
# Otherwise japanese characters (and other asian characters) will be tofu-rized...
# Modify contrib.list.
# Needed for installing ttf-mscorefonts-installer
# Using software-properties-common and add-apt-repository like following will not work.
# RUN apt-get update && \
# apt-get install -y software-properties-common && \
# add-apt-repository multiverse && \
# apt-get update && \
# apt-get install -y ttf-mscorefonts-installer fontconfig && \
# fc-cache -f -v
# E: Unable to locate package software-properties-common
RUN echo "deb http://deb.debian.org/debian stable contrib non-free" > /etc/apt/sources.list.d/contrib.list
# ubuntu's language-pack-ja does not exist for debian
RUN apt-get update && \
apt-get install -y locales locales-all
# set locales
# For other languages, use zh_CN.UTF-8, ko_KR.UTF-8, and etc.
# Simply add fonts is not enough.
RUN locale-gen ja_JP.UTF-8
RUN update-locale LANG=ja_JP.UTF-8
# Install necessary fonts
RUN apt-get update && \
apt-get install -y \
# fonts for japanese
fonts-noto-cjk \
# common microsoft office fonts
ttf-mscorefonts-installer \
fontconfig
# Clear font cache
RUN fc-cache -fv
###################################################
# libreoffice + Related Dependencies Installation #
###################################################
# install libreoffice dependencies
RUN apt-get update && \
apt-get install -y \
# required for libreoffice to function. Otherwise: Warning: failed to launch javaldx - java may not function correctly
libreoffice-java-common \
default-jre-headless \
libreoffice
########################
# Node JS Installation #
########################
RUN apt-get update && \
apt-get install -y \
--no-install-recommends nodejs \
# npm not coming together with nodejs
npm
######################
# Web Adapter Set up #
######################
# https://github.com/awslabs/aws-lambda-web-adapter?tab=readme-ov-file
COPY --from=public.ecr.aws/awsguru/aws-lambda-adapter:0.9.1 /lambda-adapter /opt/extensions/lambda-adapter
# Has to be the same port as that exposing
# default to "8080"
# Please also avoid port 9001 and 3000.
# Lambda Runtime API is on port 9001. CloudWatch Lambda Insight extension uses port 3000.
ENV PORT="8080"
# readiness check path
ENV READINESS_CHECK_PORT="8080"
ENV READINESS_CHECK_PATH="/health-check"
###################################################
# Env Set up Due to Lambda: Read-only file system #
###################################################
# Lambda: Read-only file system
# by default npm writes logs under /home/.npm
ENV NPM_CONFIG_CACHE=/tmp/.npm
# By default, java writes to home/user/.cache/dconf
# Set user.home to /temp
ENV JAVA_OPTS="-Duser.home=/tmp"
# libreoffice needs to create a dir called .cache/dconf in the HOME dir.
# So HOME must be writable. But on aws lambda, the default HOME is read-only.
ENV HOME=/tmp
##########################
# Regular Epxress Set up #
##########################
WORKDIR /app
# RUN mkdir -p ${FUNCTION_DIR}
COPY package*.json ./
RUN npm install --omit=dev
# Install the runtime interface client
# https://docs.aws.amazon.com/lambda/latest/dg/nodejs-image.html#nodejs-image-clients
# RUN npm install aws-lambda-ric
COPY . .
RUN npm run build
# Expose the port your application listens on
# has to be the same as the PORT env above
EXPOSE 8080
# Pass the name of the function handler as an argument to the runtime
CMD ["node", "build/index.js"]
Important Points
E: Could not connect to deb.debian.org:80 (151.101.110.132)
If you are running into the above error while trying to apt-get update , you could try to set up the proxy like following.
ENV http_proxy="http://ftp.jp.debian.org"
ENV https_proxy="http://ftp.jp.debian.org"
At least for me, my problem was cause by not having proxies set up correctly on one of the machines.
E: Unable to locate package software-properties-common
When we are trying to add ttf-mscorefonts-installer, a package with some common Microsoft Office fonts, one common option you might encounter is to add-apt-repository multiverse first, which requires us to install software-properties-common like following.
RUN apt-get update && \
apt-get install -y software-properties-common && \
add-apt-repository multiverse && \
apt-get update && \
apt-get install -y ttf-mscorefonts-installer fontconfig && \
fc-cache -f -v
HOWEVER!
This will give me the error above, Unable to locate package software-properties-common.
That’s why I have the following instead so that we can install ttf-mscorefonts-installer.
RUN echo "deb http://deb.debian.org/debian stable contrib non-free" > /etc/apt/sources.list.d/contrib.list
Locales & Fonts
First of all, without setting locales, Japanese characters (and some other asian characters) will show up as tofu!

To solve this problem, only adding the necessary fonts such as fonts-noto-cjk is NOT ENOUGH!
We will need to install locales!
However, ubuntu’s language-pack-ja (or whatever you need) does not exist for Debian! And that’s why we have installed locales and locales-all here instead!
We can then install fonts needed along with fontconfig for font management.
Here I have also cleared the font cache with fc-cache -fv just to ensure the newly added ones are reflected correctly.
NodeJS and npm
Of course, Debian image does not come with nodejs installed! However, as you might notice, we are also explicitly installing npm! It is not coming with the nodejs, even without the –no-install-recommends flag!
Install libreoffice
To actually have LibreOffice being able to run, we will also need to add those Java-related packages, ie: libreoffice-java-common and default-jre-headless!
Lambda Web Adapter
When we search about deploying an express app as Lambda, the first thing showing up is probably using a serverless Framework such as @codegenie/serverless-express.
However, we are not doing that here (I have tried…) because!
First of all, it requires us to modify the express code above, which means it will end up being Lambda-specific which is not really something I want to see here!
Secondly and most importantly, We don’t have an AWS based docker image!
Which means, to use the handler converted from the Express app, we will have to install aws-lambda-ric in our Dockerfile , set up the entry points, and blah!
This is painful!
Taking installation as an example, Here is what AWS suggested in Using an alternative base image with the runtime interface client.
# Install build dependencies
RUN apt-get update && \
apt-get install -y \
g++ \
make \
cmake \
unzip \
libcurl4-openssl-dev
# Install the runtime interface client
RUN npm install aws-lambda-ric
However, here is what we really needed.
RUN apt-get update && \
apt-get install -y \
build-essential \
make \
cmake \
autoconf \
automake \
libtool \
m4 \
python3 \
python3-setuptools \
unzip \
libssl-dev \
libcurl4-openssl-dev
Anyway, to set up the Web Adapter, all we have added is the following four lines! (Actually, only 2 is needed, because 8080 is the default port anyway!)
COPY --from=public.ecr.aws/awsguru/aws-lambda-adapter:0.9.1 /lambda-adapter /opt/extensions/lambda-adapter
ENV PORT="8080"
ENV READINESS_CHECK_PORT="8080"
ENV READINESS_CHECK_PATH="/health-check"
Set Environments
Again, that Lambda’s Read-only file system!
- NPM_CONFIG_CACHE for npm logs
- JAVA_OPTS for java logs because libreoffice use it
- HOME. This one might not be obvious, but libreoffice needs to create a directory called .cache/dconf in the HOME directory and by default, it is something like /opt/user…/ which is obviously not writable in Lambda’s world!
I know, a fairly short Dockerfile (if you remove those comments) but with so many points!
Build + Test It Locally
Before we put it onto AWS, let’s build the image really quick test it out!
docker build . -t office-to-pdf-converter
docker run -p 8080:8080 office-to-pdf-converter
We can then POST our file to localhost:8080!

Yes! Un-tofu-ed characters!
Deploy With CDK
If you want to manually do everything or you are planning on using some other deployment method such as terraform, you can skip it, but!
The stack is super simple, so! Please let me share it here really quick!
import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import { join } from 'path'
import { EndpointType, LambdaRestApi } from 'aws-cdk-lib/aws-apigateway'
import { Duration } from "aws-cdk-lib"
import { DockerImageCode, DockerImageFunction, Runtime } from 'aws-cdk-lib/aws-lambda';
import { Platform } from 'aws-cdk-lib/aws-ecr-assets';
export class CdkStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);
const apigatewayLambda = new DockerImageFunction(this, "OfficeToPDFLambda", {
// directory containing Dockerfile
code: DockerImageCode.fromImageAsset(join(__dirname, '..', '..', 'lambda'), {
// required for MacOS.
// Otherwise, we will get Error: fork/exec /opt/extensions/lambda-adapter: exec format error Extension.LaunchError when launching Lambda
platform: Platform.LINUX_AMD64
}),
timeout: Duration.minutes(5),
memorySize: 10240
});
const restApi = new LambdaRestApi(this, 'OfficeToPDFAPIGateway', {
handler: apigatewayLambda,
endpointTypes: [EndpointType.REGIONAL],
// required for both accept binary and sending binary
binaryMediaTypes: ["application/*"],
})
}
}
Make sure to set Platform.LINUX_AMD64 when building the DockerImageFunction, otherwise, we might get (I did get) a fork/exec /opt/extensions/lambda-adapter: exec format error Extension.LaunchError when launching Lambda!
Invoke The Cloud!
In addition to what we have for the local call, we need an extra header!
Accept!

We have to set the Accept header to application/pdf!
Case insensitive, but wild card won’t work!
This is required for API Gateway to send binary data back as it is instead of trying to base64 encode it!
That’s it for this article!
Thank you for reading!
Again, feel free to grab the full code from my GitHub!
Happy converting!
LibreOffice+Docker+Express/Lambda: Convert Office To PDF. Serverless. For Free! was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.
This content originally appeared on Level Up Coding – Medium and was authored by Itsuki