How do I create a dataset from my streamlit user inputs

megaquestion · April 22, 2025, 6:53pm

Hello! I am using streamlit within codeworkspaces to try and make an interface. I have several text boxes and when the person enters text into the text boxes and clicks the button at the bottom, then I want the text box information they have entered to go into a dataset in Foundry. Is there special syntax I need to use to get that to go into a dataset?

192c7163b888db45ab2e · April 22, 2025, 9:28pm

Hi!

On your code workspaces, while building your streamlit interface you should be able to see a Data button on the left lateral panel. When clicking on it you should be able to see an option to write data to a new dataset.
You can use the code snippet provided into your streamlit app and start writing the data you received from the text boxes into this new Foundry Dataset.

One important point :

The data you want to write must be in pandas DataFrame, polars DataFrame or pyarrow Table to write back as a tabular dataset

megaquestion · April 23, 2025, 6:45pm

Thanks for this! I am trying to take a button and if the user clicks the button then it will append all the text information into the dataset. I tried to use your code and I am getting a “‘Dataset’ object has no attribute ‘open_transaction’” error. Do you know how to fix this? I only changed the name of my dataset and then the values within new_entry

192c7163b888db45ab2e · May 6, 2025, 4:49pm

Thanks for coming back to me on this.

Sorry, went a bit too fast last time, this API is not yet supported in Jupyter Workspace.
So unfortunately we currently have to use Snapshot transactions each time we want to add some new data to the dataset (i.e. re-write all previous rows as well). Will update the comment above for posterity purposes

Went to write a Streamlit app that should be doing what you are looking for.
Here is the process for having it working.

In the folder of your app, go create a new dataset.
Add a schema to it (Go to the details tab of the dataset and then the schema selection).
Providing a really simple schema example (for the following code snippets)

{
  "fieldSchemaList": [
    {
      "type": "STRING",
      "name": "Input",
      "nullable": true,
      "userDefinedTypeClass": null,
      "customMetadata": {},
      "arraySubtype": null,
      "precision": null,
      "scale": null,
      "mapKeyType": null,
      "mapValueType": null,
      "subSchemas": null
    }
  ],
  "primaryKey": null,
  "dataFrameReaderClass": "com.palantir.foundry.spark.input.ParquetDataFrameReader",
  "customMetadata": {
    "format": "parquet"
  }
}

Go to the Jupyter Workspace where you are building the Streamlit app
3.a Open the data tab and add the newly created dataset via “read existing input”
3.b Add a first “fake” row to the dataset : Open a Jupyter notebook file and run the following :

from foundry.transforms import Dataset
import pandas as pd
test_streamlit_dataset = Dataset.get("your_dataset_name")
# This works with the schema I have created, update the dataframe to write if you need a more complex schema.
test_streamlit_dataset.write_table(pd.DataFrame({'Input': ["empty"]}))

Create you Streamlit app taking inputs from users and writing back to the dataset.
I made a quick Streamlit app example (I have tested it, normally it should work) :

import streamlit as st
import pandas as pd
from foundry.transforms import Dataset

# Streamlit app title
st.title("Add Text to DataFrame")

# Text input box
user_input = st.text_input("Enter some text:")

# Button to add text to DataFrame
if st.button("Add to DataFrame"):
    if user_input:  # Check if the input is not empty
        # Add the input as a new row in the DataFrame
        test_streamlit_dataset = Dataset.get("your_dataset_name")
        new_row = pd.DataFrame({'Input': [user_input]})
        updated_table = pd.concat([test_streamlit_dataset.read_table(format="pandas"), new_row], ignore_index=True)
        test_streamlit_dataset.write_table(updated_table)
        st.success("Text added to DataFrame!")
    else:
        st.warning("Please enter some text before adding.")

It should do what you are looking for !

Just as a warning you might have some concurrency issues if a lot of people are using the app at the same time as they might want to write data to the dataset at the same time