Compute Modules in Foundry: a field guide of shortcuts and lessons learned
May 31, 2026
Janbol Jangabyl
15 min read
TL;DR
- Compute Modules are the piece of Foundry that opens up the workloads the rest of the platform can't handle natively: GPU inference, third-party Python libraries, long-running services, anything containerized.
- They're powerful and underused. The docs are less mature than the rest of Foundry, the deployment ergonomics are a step behind Transforms and Functions, and most teams discover the rough edges the hard way.
- Four lessons that save you the most pain: start with the workflow not the container, get the deployment checklist right the first time, plan the OSDK plug-in around real data-transfer limits, and tune replicas and concurrency for your actual workload before you ship.
- This is a field guide written across countless deployments. Use it as a checklist on your first Compute Module, or as a sanity check on your tenth.
Who this is for
Foundry developers, Forward-Deployed Engineers, and ML/data engineers who have shipped Transforms, Functions, and Workshop apps and are now staring at a workload that none of those can host. If the phrase "we need a GPU" or "we need to wrap an existing Python library" or "we need a long-running service inside Foundry" has come up in the last week, this is for you.
1. Start with the workflow, not the container
The first instinct when you discover Compute Modules is to build the container. Wrap the library. Make a Hello World deploy. Run a smoke test. Feel productive.
This is a trap. The container is the easy part. The hard part is where the container plugs in, what it returns, and how the rest of the workflow stays clean around it.
Before writing the Dockerfile, walk the pipeline end to end. Identify the exact step where the Compute Module belongs. Maybe it's a slow Python Transform that would be ten times faster on a GPU. Maybe it's a Workshop button that needs to call a library Foundry can't host. Maybe it's an AIP Chatbot tool that needs to invoke a model that's too custom for AIP Logic. Whatever it is, find the one step.
The trap to avoid: do not rebuild the surrounding pipeline to suit the container. If the existing Workshop app expects an Ontology object back, the Compute Module should return something that maps cleanly to that object. Not a tuple. Not a giant blob. A structured response your downstream Action Type can ingest.
A few sub-decisions that fall out of this:
- Function execution mode vs. pipeline execution mode. Function mode is what you want for interactive workloads called from Workshop, Slate, OSDK, or AIP Logic. Pipeline mode is for autonomous data jobs that read from inputs and write to outputs continuously. The execution modes doc walks through both.
- What input you accept. The agent or app passes you the reference, the Compute Module pulls the bytes via the Foundry API. Smaller payloads, cleaner audit trail.
If you can sketch the input and output shape on a whiteboard in five minutes, you're ready to write the container. If you can't, keep walking the workflow.
2. Get the deployment checklist right, then deploy
The first deployment is where most teams lose two days. There's a sequence of small things, and each one has to be right. Here's the version we run.
Project structure
Mirror the structure from the getting started doc:
MyComputeModule/
├── Dockerfile
├── requirements.txt
└── src/
└── app.py
src/app.py is where your logic, processes, and the @function-decorated handlers live. The model weights either get baked into the image (simple, but bloats the image) or downloaded at container startup (cleaner image, slower cold start).
If you want to add weights from your custom trained models, place them under a new folder within the root. In this case: MyComputeModule. Bake them in based on how fast you want the compute modules to warm up. Generally it is better to have it downloaded at startup for anything small or anything that updates frequently. For large weights that don't need to be updated often, better to bake it in before pushing it to registry.
Dockerfile
FROM --platform=linux/amd64 python:3.12
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY src .
# Foundry requires a non-root, numeric USER
USER 5000
CMD ["python", "app.py"]
Three things go wrong here repeatedly:
- Wrong platform. The image must match the Foundry resource queue's architecture.
--platform=linux/amd64is the default. If your team builds on Apple silicon and skips the platform flag, the image fails on deploy in a way that's not immediately obvious. - USER not numeric or not set. The container must run as a non-root numeric UID.
USER 5000is the convention.USER appuserwill fail. - Missing the entry point. The CMD must run a script that starts the Compute Module client (the SDK handles this if you call
start()or use the@functiondecorator with a main block).
requirements.txt
foundry-compute-modules
pyarrow
numpy>=1.24,<2.0
pandas
sentence-transformers
# plus whatever else your code needs
The foundry-compute-modules library is what gives you the @function decorator and the event/context pattern. The rest are just an example libraries and should be customized based on your needs.
The function code
import logging, sys, time
from dataclasses import dataclass
from typing import List
from compute_modules.annotations import function
logging.basicConfig(level=logging.INFO, stream=sys.stdout, force=True)
log = logging.getLogger(__name__)
# ---- Startup: runs once when the container boots ---------------------------
log.info("Compute Module initializing...")
start = time.time()
# Load models, indices, reference data into module-level globals.
# Anything heavy goes here — not inside the @function.
ASSETS = {
# "model": load_model("/app/model"),
# "index": load_index("/app/data"),
}
log.info(f"Ready in {time.time() - start:.2f}s")
# ---- Typed payload (input) + result (output) (become data types) ------------------------
@dataclass
class SearchPayload:
partition: str
query: str
limit: int
@dataclass
class Result:
record_id: str
label: str
score: float
# ---- Exposed function (callable from Workshop, Actions, AIP, Pipelines) ----
@function
def search(context, event: SearchPayload) -> List[Result]:
log.info(f"search: partition={event.partition}, query='{event.query}'")
try:
# Your logic — use globals loaded at startup
# results = ASSETS["index"].query(event.query, event.limit)
return []
except Exception as e:
log.info(f"Error: {e}", exc_info=True)
raise
...
The two parameters are non-negotiable: context and event. context carries credentials and metadata (user tokens, source credentials, OSDK access). event carries the input data. If you use a TypedDict or dataclass for the event type, the SDK can automatically infer the function schema and register it as a Foundry Function once the Compute Module is running. Skip the typing and you'll be doing manual schema registration. Don't skip the typing.
Please pay close attention to your dataclass structure and type. if you have done the step 1 and walked through the workflow and integration, you should know exactly what type and structure of data you need for the output.
Publishing to the Foundry Artifact repository
Two paths:
- Local build, push to a Foundry Artifact repository. Build locally with Docker, follow the in-platform instructions in the Compute Module's Documentation tab to push to the registry. For this path, the docker image has to be built and pushed to the registry using the command that look somewhat similar to this:
docker build --platform linux/amd64 -t <name>-container-registry.palantirfoundry.com/{image-name-tag}
export REPOSITORY= {AUTO GENERATED RID FROM FOUNDRY COMPUTE MODULE}}
export TOKEN= {AUTO GENERATED TOKEN FROM FOUNDRY COMPUTE MODULE}
echo "$TOKEN" | docker login -u "$REPOSITORY" --password-stdin <name>-container-registry.palantirfoundry.com
docker push --platform=linux/amd64 <name>-container-registry.palantirfoundry.com/{image-name-tag}
- Code Repositories with Python Compute Module template. Skip the local Docker step entirely. Foundry handles the build and the image push. Faster iteration loop for Python-only Compute Modules. We use this path more often than the local one for new modules.
Module configuration
If your container needs to call out (an external API, a model registry, your own GPU model hosting), use Foundry Sources for the credentials. For each Source you import into the Compute Module configuration, a pre-configured client is generated for you. Do not bake credentials into the image. We've seen it. It never ends well.
GPU and resource configuration
The replica defaults are 1 vCPU, 4 GB memory, and no GPU. For ML models, you'll bump CPU and memory and might want to request a GPU on the replica config. Per-replica config matters because it sets the cost floor and the performance ceiling for every parallel call your module handles.
A rough heuristic: start with the smallest GPU your model can run on, the lowest concurrency limit that supports your latency target, and adjust from there. Over-provisioning early is more painful than under-provisioning early, because cost compounds and tuning down is harder than tuning up.
Start and smoke-test
Once configured, start the Compute Module from the Overview page. Use the Query panel at the bottom to test the function before wiring it into Workshop or AIP Logic. If the function doesn't show up in the Functions tab, the schema inference didn't run, which usually means the SDK isn't started correctly in your entry point, or your function signatures aren't typed. In this case, main culprit most likely be in app.py and should be the first place to check.
A few failure modes worth knowing about when working with @function handlers and custom logic in Compute Modules:
- Dataclass objects use Foundry-unsupported types. The Compute Module SDK introspects your
@dataclasspayload and result types to register them with the Ontology, and it only supports a specific set of primitives (str,int,float,bool,listsof those, and nested dataclasses). If you use types likedatetime,Optional[X], orAnytypes in your dataclass fields, registration silently fails and the function never appears. Rewrite the object to stay as close as possible to supported types; convert timestamps to ISO strings, replace Optional with a sentinel value (-1.0 for floats, "" for strings), and flatten dicts into explicit fields or aList[KeyValue]nested dataclass. - Heavy assets are loaded inside the decorated function instead of at module level. If you're loading a large custom trained model or pulling reference data, do it at the top of the file, that is, outside and above the
@functionhandler so it runs once when the container boots. Anything inside the function body re-runs on every invocation, which not only kills latency but can push individual calls past their timeout and make the function look broken in the Functions tab. The pattern is: load into module-level globals at import time, then read from those globals inside the handler. - Not waiting long enough after publishing. Even after the container reports healthy, it can take another 1–2 minutes for Foundry to pick up the registered functions and surface them in the Functions tab. Before you start tearing your code apart looking for a bug, give it a couple of minutes and refresh. If you jump straight into changing the dataclass or rewriting the handler, you're likely chasing a problem that would have resolved itself and then your "fix" triggers another rebuild and another wait cycle.
This checklist sounds long. It is. It's also the difference between a two-hour deploy and a two-day debugging session. Run it the same way every time.
3. Plug in: OSDK, Workshop, and the data-transfer trap
Once the module is running and the functions are registered, you can call them from anywhere a Foundry Function is callable: Workshop, Slate, AIP Logic, AIP Chatbot Studio (as a Function tool), OSDK applications, and Code Repositories.
The pattern we use most: register the function, then expose it via OSDK to a Workshop widget or an external OSDK app. The OSDK integration doc walks through the wiring. The short version is: the Compute Module function becomes a callable in the generated OSDK client, and your app invokes it like any other Foundry Function.
A few things to be careful about here.
OSDK data-transfer limits
Compute Module function responses serialize over a request/response interface, which imposes practical limits on payload size per call. If you're tempted to return in both directions a base64-encoded image, video, or anything larger than ~100 MB, stop and batch the payload instead. Note that this also means the logic under your @function decorator needs to be restructured to process requests in batches.
In our experience, the limit sits around 100 MB, and Foundry's error log is generic and unhelpful for diagnosing which OSDK call to your Compute Module function is failing. If your Compute Module runs fine standalone but OSDK calls fail, this is a likely culprit.
Cold start latency
If you set the minimum replicas to zero (the scale-to-zero option), the first call after a quiet period will incur a cold start. For most workloads this is fine. For interactive Workshop calls that need to feel instant, set the minimum to one. The cost of a warm replica is the price of predictable latency. Make the call deliberately.
Container-warmed model state
A single replica can keep your model loaded in memory across calls. Two replicas have two model copies. State does not travel across replicas. Don't rely on accrued state between calls hitting different replicas.
Function permissions
Function execution mode supports two permission modes: no platform permissions (the function runs without OSDK access) and application permissions (a service user the function runs as for platform calls). If your function needs to read Ontology objects or write to a dataset, pick application permissions. If it's pure compute on the event payload, no platform permissions is simpler.
4. Manage replicas and plan for scale
The platform handles horizontal scaling for you, but only well if the configuration matches your workload. The scaling doc has the precise numbers. The shorthand:
Replicas
- Minimum replicas. Zero means scale to zero when idle (saves cost, costs cold-start latency). Non-zero means always-warm.
- Maximum replicas. The hard cap. Set it based on cost ceiling and the expected peak.
Concurrency limit
The maximum number of requests a single replica can process simultaneously. Default is 1, which means sequential processing per replica. For CPU-bound workloads, increasing concurrency can be efficient. For GPU-bound workloads (like CV models), the GPU is usually the bottleneck, and a concurrency above 1 just queues requests on the GPU. Most of our GPU-backed Compute Modules run concurrency 1, with horizontal scaling doing the work.
Auto-scaling thresholds
- Scale-up triggers when load (current running jobs divided by current replicas times concurrency limit) is at or above 0.75 for one minute.
- Scale-down triggers when load is below 0.75 for thirty minutes.
The asymmetry matters. Scaling up is fast (one minute); scaling down is deliberate (thirty minutes). This is the right shape for interactive workloads where you'd rather pay for a few extra warm replicas than serve a slow first call.
Predictive scaling
Compute Modules track historic query load and try to scale up before anticipated demand. This is helpful for predictable load patterns (business hours, end-of-quarter document processing) and irrelevant for spiky unpredictable load. Monitor it for the first few weeks of production and adjust.
Scheduled overrides
If your Workshop app is mostly used 8am-6pm in one timezone, set a scheduled override to bump minimum replicas during those hours and let it drop to zero overnight. The cost difference adds up over a quarter.
A few things the docs don't tell you, but we've learned
A short list, in no order, for the team about to deploy their first one.
- The function schema only appears in the Functions tab after the module is running. If your function isn't showing up, the module isn't actually running or your entry point isn't calling
start(). Restart, check the logs. - The Compute Module's logs are your best friend. They show up in the Overview page once the module is running. Print liberally during the first deploy. Strip the noise later.
- Image size affects deploy time, not just storage. Bake only the model weights you need. Multi-stage Dockerfiles help.
- Test locally before pushing.
docker runwith the same--platform=linux/amd64flag will catch 80% of the issues that would otherwise show up only after a push and start. - The
Querypanel in the Compute Module UI is the fastest test loop. Use it before wiring the function into Workshop. Catch input/output shape mismatches early.
Where this fits in our Arsenal
We use Compute Modules for almost every workload that needs to bring a non-Foundry-native library into the platform. The Arsenal modules that ship as Compute Modules include the YOLO Modeling Objective, the Paddle Detection + Cropping Pipeline, the Paddle Table Structure Extractor, the Handwritten Row Extractor, and Custom fine tuned LLM agent. Each is a Compute Module under the hood, configured around the patterns above, and stitched into the rest of Foundry through Action Types, AIP Logic functions, and Workshop widgets.
Reusing those modules saves us the work of deploying a new Compute Module from scratch on every engagement. The shape of the deploy stays the same; the contents change.
What am I missing?
If you've shipped Compute Modules at scale and have a workflow trick that's not in this guide, write me. I'd rather collect a real list of edge cases from people who've debugged them in production than re-discover them on the next deploy. Same offer if you think we're wrong on any of the recommendations above. Containers and configurations move fast; the field guide should move with them.
If you're standing up your first Compute Module and want a second pair of eyes on the architecture before you start the deploy, talk to us. We'll send you the same checklist we use internally.
Written by
