Problem Statement:
A project requires to optimize certain inputs of a model to achieve the wanted output targets. However, the model is wrapped in a docker container and therefore, it is not possible to directly put the model into the transform. Therefore, the sidecar is needed for this project.
Option A: Fulfill the parallel running with multiple container instances
Document Source: https://www.palantir.com/docs/foundry/transforms-python/transforms-sidecar/#example-2-parallel-execution
- Just as in the example provided, define all the combination of inputs as a data frame.
| policyId | param 1 | param 2 | ... |
| ---------| 0.1 | 0.2 | ... |
| ---------| 0.2 | 0.4 | ... |
| ---------| 0.3 | 0.5 | ... |
- Within the transform, define a main function that format each row into input files and trigger multiple container instances like below
def main(a_row):
@trainsform():
def compute(output, output_rows, source, ctx):
def main(a_row):
format_inputs_into_shared_volume()
copy_start_flag()
wait_for_done_flag()
copy_output_files()
post_processing_results()
results = policy.dataframe().repartition(4).rdd.map(main)
Option B: Fulfill the parallel running with one container instance
Basically, utilize some parallel job libraries like Ray [https://docs.ray.io/en/latest/ray-overview/index.html], wrap everything within the container and the transform only trigger the instance once.