Compute Modules in Pipeline Mode - how to commit a transaction to a dataset?

nicornk · September 28, 2025, 6:33pm

I have been trying compute modules pipeline mode today in preparation for an “email polling and forwarding to foundry” workflow.

I struggle to understand how I could commit a transaction. It seems the build2 system opens automatically a transaction - fine. I can upload files using the dataproxy - been there, done that.

But there is no way to indicate, “hey I am done with this” / commit the transaction so that downstream pipelines of the output dataset get triggered.

Seems the only way is to use stream-proxy? I don’t want to use stream-proxy, because I have binary files to upload to a dataset for processing.

Anyone found a solution for this?

bkaplan · September 28, 2025, 7:42pm

I believe that the transaction is automatically opened when the CM starts to run and closes when the CM terminates. As such, to commit the transaction, I think you would just the let the CM exit.

You might also be able to leverage the foundry public api and use the create and commit transaction endpoints. Alternatively, it might be worth thinking on if you you can leverage a transactionless mediaset to put the files into.

What are you trying to do here / why are you using a CM for this as opposed to a standard pipeline (external transform)?

nicornk · September 28, 2025, 8:02pm

CM modules are the cheapest option to run 24/7 ingestion workloads due to the 0.2 usage rate multiplier and the ability to run with 0.1 cpu share.

In this workflow I will connect to an on-prem IMAP server (using agent proxy) and poll for messages in the inbox, to transfer the messages into a foundry dataset as binary files.

I tried manually handling transactions but that failed with permission errors. Stopping the compute module is not an option since it should keep polling.

In addition, I am wondering how to configure the transaction type (snapshot, append, update). I think it’s not possible since build2 handles it, but maybe someone can confirm?

I know I can go for a stream proxy or other workarounds such as being your own TPA, but I would first like to understand why regular dataset outputs are mentioned in the docs/walkthroughs.