SharePoint Metadata

I want to get the metadata of the files stored on a sharepoint folder. I have a existing data connection, can i somehow leverage that connection? or will I need to use sharepoint API via some agent?
do you have any example I can follow?

Hi @hbhatia8 ,

Independently of how you obtain the files in the SharePoint drive, you can always run a PB to extract meta:

In my case I created a direct connection, following this documentation

And synced the files in a SharePoint directory to a data-frame in Foundry.
As it is using a File based sync, you can apply any filters you might need:

  • Exclude files already synced: Only sync files that were added or modified in size or date since the last sync.
  • Path matches: Only sync files with a path (relative to the root of the connector) that matches the regular expression.
  • Path does not match: Only sync files with a path (relative to the root of the connector) that does not match the regular expression.
  • Last modified after: Only sync files that have been modified after a specified date and time.
  • File size is between: Only sync files with a size between the specified minimum and maximum byte value.
  • Any file has path matching: If any file has a relative path matching the regular expression, sync all files in the subfolder that are not otherwise filtered.
  • At least N files: Sync all filtered files only if there are at least N files remaining.
  • Limit number of files: Limit the number of files to keep per transaction. This option can increase the reliability of incremental syncs.

This is awesome, I had a bit of a follow-up to it.
I did get the metadata as a dataset.

In this dataset, it is possible that some files will be modified (say in a week or month). By modified I mean, some files may be updated, deleted and some new files may be added. I want to be able to detect these files, by comparing it with the previous metadata (using columns like path, modified and size). Can that be somehow done in pipeline builder?

Yes, these can be configured in the Data Connection and Folder Sync:

Supported capabilities

Capability Status
Exploration :green_circle: Generally available
Bulk import :green_circle: Generally available
Incremental :green_circle: Generally available
Export tasks :yellow_circle: Sunset
File exports :green_circle: Generally available

You should be able to configure it how you want to process updates, additions, etc. Unfortunately I don’t have a screenshot to share, but should be in the configuration of the SharePoint source.

Please let me know if this helps you

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.