I have some email inbox that I would like to ingest in Foundry, so that my users can send attachment to process in a pipeline.
Is there a way to ingest emails e.g. by monitoring and ingesting the content of an email inbox, in Foundry ?
I have some email inbox that I would like to ingest in Foundry, so that my users can send attachment to process in a pipeline.
Is there a way to ingest emails e.g. by monitoring and ingesting the content of an email inbox, in Foundry ?
As of 4/30/24 there is no email connector. Your best bet is to use external transforms with python to hit your mail server API, or to save the emails into an S3 bucket (or similar) and ingest them from there into Foundry.
As a workaround we are using AWS SES → S3 → Foundry
If you are connecting to outlook you can use the Microsoft Graph API via a REST source in Foundry to connect.
There is also an available connector for Gmail.
As a reply to myself, here is a specific example with Exchange for instance:
https://community.palantir.com/t/ms-exchange-online-to-foundry/386/3
You are probably going to want some intermediary service. Yes, you could use the GMail JDBC driver, but that won’t work for all email providers, and I don’t think it is straightforward to understand.
I built a PoC of an email streaming pipeline using EmailEngine (https://github.com/postalsys/emailengine) as a data source.
If you have seen Nylas (https://www.nylas.com/) before, this service is similar but self-hostable and more barebones. EmailEngine can hook into an email client (Gmail, Outlook, etc.) in an OAuth fashion, and you can configure it to send webhooks to a service of your choosing (in this case, Foundry) based on inbox conditions, like receiving a new message.
To make this work, I needed to:
I hooked up my own GMail to it and created an ontology as a PoC.
Note: In case you want to stream the email to Foundry - which is a different approach than batch but likely useful in some scenario - you can likely use compute module to host and run the docker in Foundry as well
https://community.palantir.com/t/streaming-from-on-prem-dashboard-service/1402/3
Some basic code snippets at least to test the connection. It won’t save anything on the output put at least will let you test the “connection” alone.
from myproject.datasets import utils_receive, utils_send
@external_systems(
mail_source=Source("ri.magritte..source.XXXX")
)
@transform(
out=Output(
"/path/example_to_ingest"
),
)
def compute(mail_source, out):
# Extract information for the connection
email = mail_source.get_secret("additionalSecretAccountEmail")
password = mail_source.get_secret("additionalSecretAccountPassword")
url= "URL_OF_EMAIL_SERVER"
logging.info("trying to connect via IMAP SSL")
utils_receive.connect_IMAP_SSL(email, password, url) # 993
logging.info("trying to connect via IMAP NO SSL")
utils_receive.connect_IMAP_NO_SSL(email, password, url) # 143
logging.info("trying to connect via SMTP SSL")
utils_send.connect_SMTP_SSL(email, password, url)
utils_receive.py
def connect_IMAP_SSL(username="your_email@example.com", password="your_password", imap_url="imap.example.com"):
'''
# FOR TESTING ONLY
# Create an SSL context that does not verify certificates
context = ssl.create_default_context()
context.check_hostname = False
context.verify_mode = ssl.CERT_NONE
# Create an IMAP4 class with SSL using the custom context
mail = imaplib.IMAP4_SSL(imap_url, ssl_context=context)
'''
# Create an IMAP4 class with SSL
imaplib.Debug = 4
mail = imaplib.IMAP4_SSL(imap_url)
# Authenticate
mail.login(username, password)
# Select the mailbox you want to check
mail.select("inbox")
# Search for specific emails
status, messages = mail.search(None, "ALL")
# Convert messages to a list of email IDs
email_ids = messages[0].split()
for email_id in email_ids:
# Fetch the email by ID
status, msg_data = mail.fetch(email_id, "(RFC822)")
for response_part in msg_data:
if isinstance(response_part, tuple):
msg = email.message_from_bytes(response_part[1])
# Decode the email subject
subject, encoding = decode_header(msg["Subject"])[0]
if isinstance(subject, bytes):
subject = subject.decode(encoding if encoding else "utf-8")
print("Subject:", subject)
# Close the connection and logout
mail.close()
mail.logout()
def connect_IMAP_NO_SSL(username="your_email@example.com", password="your_password", imap_url="imap.example.com"):
# Create an IMAP4 class without SSL
mail = imaplib.IMAP4(imap_url)
# Authenticate
mail.login(username, password)
# Select the mailbox you want to check
mail.select("inbox")
# Search for specific emails
status, messages = mail.search(None, "ALL")
# Convert messages to a list of email IDs
email_ids = messages[0].split()
for email_id in email_ids:
# Fetch the email by ID
status, msg_data = mail.fetch(email_id, "(RFC822)")
for response_part in msg_data:
if isinstance(response_part, tuple):
msg = email.message_from_bytes(response_part[1])
# Decode the email subject
subject, encoding = decode_header(msg["Subject"])[0]
if isinstance(subject, bytes):
subject = subject.decode(encoding if encoding else "utf-8")
print("Subject:", subject)
# Close the connection and logout
mail.close()
mail.logout()
utils_send.py
import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
def connect_SMTP_SSL(username="your_email@example.com", password="your_password", smtp_url="smtp.example.com", recipient="myemail@domain.com"):
# Create the email
msg = MIMEMultipart()
msg["From"] = username
msg["To"] = recipient
msg["Subject"] = "Test Email"
# Body of the email
body = "This is a test email from Python script."
msg.attach(MIMEText(body, "plain"))
# Connect to the SMTP server
server = smtplib.SMTP(smtp_url, 587)
server.starttls()
server.login(username, password)
# Send the email
server.sendmail(msg["From"], msg["To"], msg.as_string())
# Disconnect from the server
server.quit()