Model adapter when building light GBM

TQa-tokuda · September 10, 2024, 11:17am

Dear all,

When I build models using scikit-learn, I can import and use foundry-sklearn-adapter.

However, now I try to build models using light GBM.
Is there any libraries only for light GBM?
Or, should I use custom adopters from the file adopter.py in the Model Training Template?

tucker · September 10, 2024, 2:05pm

Hi @TQa-tokuda!

For a model built with LightGBM, I would recommend writing your own custom adapter. I would assume that lightgbm supports serialization/deserialization of models using dill (similar to pickle, but covers more cases), so you can use the DillSerializer auto serializer to handle the saving and loading of your model.

import palantir_models as pm
from palantir_models_serializers import DillSerializer


class LightGbmAdapter(pm.ModelAdapter):
    @pm.auto_serialize(
        model=DillSerializer(),
    )
    def __init__(self, model): # you would pass your lightgbm model to the adapter via the init
        self.model = model

    @classmethod
    def api(cls):
        # define your api here. Below is an example API showing inputs and outputs
        # inputs = {
        #     "df_in": pm.Pandas(columns=[("input_column", str)]),
        #     "param_in": pm.Parameter(type=str, default="default_value")
        # }
        # outputs = {
        #     "df_out": pm.Pandas(columns=[("output_column", str)])
        # }
        return inputs, outputs
    def predict(self, df_in, param_in):
        # this is where you would call the predict method on your lightgbm model

If you haven’t already tried, I would recommend using Code Workspaces for model training as it provides a much more interactive experience allowing model training in Jupyter Notebooks. That way you can debug any issues with your adapter on the fly using any of the available testing methods. For example, once you construct your adapter you can test it using:

# this will return a named tuple with the fields in the tuple matching your 
# model adapter api's output fields
my_adapter.run_inference(...) # pass the inputs here

I hope this helps! Let me know if you need any more assistance!

TQa-tokuda · September 11, 2024, 7:09am

Hi, @tucker .

Thank you for your answer.
I haven’t tried yet, so I’ll do it soon.

Besides, can I please ask one more question?
You recommended Code Workspaces and Jupyter Notebook is more familiar and easier for modeling to me, but sometimes it costs more than using Code Repository.

Could you tell me the way to save the cost with COde WOrkspaces?

laksa · September 11, 2024, 9:10am

Have you considered using https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html (or Classifier) instead? It’s essentially an implementation of LightGBM with sklearn APIs (and some nice features like support for null and categorical features without one-hot encoding).

TQa-tokuda · September 17, 2024, 7:55am

Thank you for your information, @laksa .

I’ve never tried it yet.
As referred in your link, my model handle big datasets. So, I’ll try it.