Can I ingest from an email server or a given email to Foundry?

I have some email inbox that I would like to ingest in Foundry, so that my users can send attachment to process in a pipeline.

Is there a way to ingest emails e.g. by monitoring and ingesting the content of an email inbox, in Foundry ?

As of 4/30/24 there is no email connector. Your best bet is to use external transforms with python to hit your mail server API, or to save the emails into an S3 bucket (or similar) and ingest them from there into Foundry.

As a workaround we are using AWS SES → S3 → Foundry

If you are connecting to outlook you can use the Microsoft Graph API via a REST source in Foundry to connect.

There is also an available connector for Gmail.

As a reply to myself, here is a specific example with Exchange for instance:
https://community.palantir.com/t/ms-exchange-online-to-foundry/386/3

You are probably going to want some intermediary service. Yes, you could use the GMail JDBC driver, but that won’t work for all email providers, and I don’t think it is straightforward to understand.

I built a PoC of an email streaming pipeline using EmailEngine (https://github.com/postalsys/emailengine) as a data source.

If you have seen Nylas (https://www.nylas.com/) before, this service is similar but self-hostable and more barebones. EmailEngine can hook into an email client (Gmail, Outlook, etc.) in an OAuth fashion, and you can configure it to send webhooks to a service of your choosing (in this case, Foundry) based on inbox conditions, like receiving a new message.

To make this work, I needed to:

  1. Get EmailEngine running - I did this locally with Docker.
  2. Hook up an email account to EmailEngine
  3. Create a Foundry Stream with the correct schema
    1. I had to send a few email engine webhooks to a service so I could see and understand the JSON schema
  4. Create an EmailEngine webhook to forward info to the Foundry Stream
  5. Create Pipeline Builder Streaming pipeline to munge the JSON
  6. Profit

I hooked up my own GMail to it and created an ontology as a PoC.


1 Like

Note: In case you want to stream the email to Foundry - which is a different approach than batch but likely useful in some scenario - you can likely use compute module to host and run the docker in Foundry as well
https://community.palantir.com/t/streaming-from-on-prem-dashboard-service/1402/3

Some basic code snippets at least to test the connection. It won’t save anything on the output put at least will let you test the “connection” alone.

from myproject.datasets import utils_receive, utils_send

@external_systems(
    mail_source=Source("ri.magritte..source.XXXX")
)
@transform(
    out=Output(
        "/path/example_to_ingest"
    ),
)
def compute(mail_source, out):
    # Extract information for the connection
    email = mail_source.get_secret("additionalSecretAccountEmail")
    password = mail_source.get_secret("additionalSecretAccountPassword")
    url= "URL_OF_EMAIL_SERVER"

    logging.info("trying to connect via IMAP SSL")
    utils_receive.connect_IMAP_SSL(email, password, url) # 993

    logging.info("trying to connect via IMAP NO SSL")
    utils_receive.connect_IMAP_NO_SSL(email, password, url) # 143

    logging.info("trying to connect via SMTP SSL")
    utils_send.connect_SMTP_SSL(email, password, url)

utils_receive.py

def connect_IMAP_SSL(username="your_email@example.com", password="your_password", imap_url="imap.example.com"):

    ''' 
    # FOR TESTING ONLY
    # Create an SSL context that does not verify certificates
    context = ssl.create_default_context()
    context.check_hostname = False
    context.verify_mode = ssl.CERT_NONE
    
    # Create an IMAP4 class with SSL using the custom context
    mail = imaplib.IMAP4_SSL(imap_url, ssl_context=context)
    '''

    # Create an IMAP4 class with SSL
    imaplib.Debug = 4
    mail = imaplib.IMAP4_SSL(imap_url)

    # Authenticate
    mail.login(username, password)

    # Select the mailbox you want to check
    mail.select("inbox")

    # Search for specific emails
    status, messages = mail.search(None, "ALL")

    # Convert messages to a list of email IDs
    email_ids = messages[0].split()

    for email_id in email_ids:
        # Fetch the email by ID
        status, msg_data = mail.fetch(email_id, "(RFC822)")

        for response_part in msg_data:
            if isinstance(response_part, tuple):
                msg = email.message_from_bytes(response_part[1])
                # Decode the email subject
                subject, encoding = decode_header(msg["Subject"])[0]
                if isinstance(subject, bytes):
                    subject = subject.decode(encoding if encoding else "utf-8")
                print("Subject:", subject)

    # Close the connection and logout
    mail.close()
    mail.logout()



def connect_IMAP_NO_SSL(username="your_email@example.com", password="your_password", imap_url="imap.example.com"):
    # Create an IMAP4 class without SSL
    mail = imaplib.IMAP4(imap_url)
    # Authenticate
    mail.login(username, password)

    # Select the mailbox you want to check
    mail.select("inbox")

    # Search for specific emails
    status, messages = mail.search(None, "ALL")

    # Convert messages to a list of email IDs
    email_ids = messages[0].split()

    for email_id in email_ids:
        # Fetch the email by ID
        status, msg_data = mail.fetch(email_id, "(RFC822)")
        
        for response_part in msg_data:
            if isinstance(response_part, tuple):
                msg = email.message_from_bytes(response_part[1])
                # Decode the email subject
                subject, encoding = decode_header(msg["Subject"])[0]
                if isinstance(subject, bytes):
                    subject = subject.decode(encoding if encoding else "utf-8")
                print("Subject:", subject)

    # Close the connection and logout
    mail.close()
    mail.logout()

utils_send.py


import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText


def connect_SMTP_SSL(username="your_email@example.com", password="your_password", smtp_url="smtp.example.com", recipient="myemail@domain.com"):
    # Create the email
    msg = MIMEMultipart()
    msg["From"] = username
    msg["To"] = recipient
    msg["Subject"] = "Test Email"

    # Body of the email
    body = "This is a test email from Python script."
    msg.attach(MIMEText(body, "plain"))

    # Connect to the SMTP server
    server = smtplib.SMTP(smtp_url, 587)
    server.starttls()
    server.login(username, password)

    # Send the email
    server.sendmail(msg["From"], msg["To"], msg.as_string())

    # Disconnect from the server
    server.quit()