It’s happened with multiple models, and the only solution seems to be to trash the model and make a new one. Basically I think when you are running a transform with a model output, but hen separately build another transform with an older version of that modelinput, something shifts in the transactions in the model where it shows that it’s updated to the most recent transform on the model asset ui, but then the other transforms, as well as any calls to the model adapter (ie evaluator repo), fail with a 403 error (Permission Denied), always with the prior model version). Nothing fixes this, other than fixing the version of the modelinput, which of course is not a good solution. I just have to chuck the model and build it fresh.
A 403 Client Error : Forbidden error occurred calling https://waypoint-envoy.rubix-system.svc.cluster.local:8443/compute/a82899/foundry/models-api/models/api/model/ri.models.main.model.b2f4049f-af71-4c3e-ac89-0cacf6c22a9f/versions/ri.models.main.model-version.dd06de14-a428-4b80-9a3b-642a7f0fc241 . ErrorName: Security:PermissionDenied . Please review and contact support if the problem persists
Hi @shah, trying to reproduce your error but I’m unable to. I’m hoping you can give me some more details, but based on your description I came up with the following:
First I run model training using this adapter and “training” code:
which just returns a df containing whatever value is passed to the model when its created (in this case I am using the current time).
“training” code is
from transforms.api import lightweight, transform
from palantir_models.transforms import ModelOutput
from main.model_adapters.adapter import ExampleModelAdapter
import time
@lightweight
@transform(
model_output=ModelOutput("ri.models.main.model.0d3821ef-b692-4f40-953b-9570aa243eb4"),
)
def compute(model_output):
now = str(time.time())
foundry_model = ExampleModelAdapter(now)
model_output.publish(
model_adapter=foundry_model,
notes=now
)
and inference
from transforms.api import Output, lightweight, transform
from palantir_models.transforms import ModelInput
@lightweight
@transform(
model_input=ModelInput("ri.models.main.model.0d3821ef-b692-4f40-953b-9570aa243eb4"),
output=Output("ri.foundry.main.dataset.9c6db9d1-e1d1-4b8d-8cea-1366b0f95f8e"),
)
def compute(model_input, output):
res = model_input.predict("")
res["rid"] = model_input.model_version_rid
output.write_pandas(res)
I run the training build, and once complete I then run the subsequent inference build. I see in the output dataset of the inference build has the correct values (the time is correct and the model version rid in the rid column is correct). So I am not sure what I am missing in order to reproduce your error.
can you try without the lightweight? And my other concern with this test is that there is not long enough time for the training to actually run while you start inference from a previous version of the model…if that makes sense. maybe if you can add some kind of timer loop in the training, and then run your inference while it’s training…
Yeah I can give that a shot, but Im confused - what are you expecting will happen if you have a training job running and launch an inference job while that is still running? It should resolve to the previous model version (and if it doesn’t then yes there is something wrong and we can take a look at fixing it) not the version that is currently being produced.