This section explains how to access data from input files in model steps for analysis or in other workflow steps such as code steps.

Prerequisites

When working with file documents (PDF, Images, etc.), Cortex needs to extract the data first before using it in model steps or other workflow steps. To access file contents or images, you must enable the appropriate settings in the file field settings based on what you want to access:

If these properties are not enabled, attempts to access contents or images will return empty results.

Accessing File Data in Model Messages

There are three methods to access file data in model messages:

  1. Using :contents - Passes the complete text content of the document to the model

    {{input.document:contents}}
    
  2. Using :images - Passes the file (e.g. PDF) images/pages to the model

    {{input.document:images}}
    
  3. Using :file - Passes raw files (like audio or video) directly to compatible models (e.g., Gemini for video or audio files)

    {{input.video:file}}
    

    These suffixes also work with file fields that accept multiple files. When a file field has the “multiple” option enabled, using these suffixes will pass all files’ data to the model: - :contents will pass the text content from all files - :images will pass all images/pages from all files - :file will pass all raw files

Cortex automatically converts these placeholders into actual document data before sending to the model.

These suffixes are exclusively used in model messages with double curly braces:

Guidelines for Choosing Between Contents and Images

  • Use :contents for:

    • Long documents where cost is a concern
    • Simple analysis tasks
  • Use :images for:

    • Complex documents
    • Forms
    • Documents with important visual elements
    • Tasks requiring high accuracy

Accessing File Data in Code Steps

Utilize utility functions to access file contents and images:

CODE_STEP
const documentContents = util.getFileContents(input.document);
const documentImages = util.getFileImages(input.document);

File Content Structure

The file content structure in code steps varies depending on whether multiple file uploads are enabled:

{
  id: number;
  type: string;
  content: string;
  content_as_html?: string;
  page: number;
  parent?: number;
}

Single file upload example:

CODE_STEP
const contents = util.getFileContents(input.document);
contents.forEach((content) => {
  console.log(`Page ${content.page}: ${content.content}`);
});

Multiple file upload example:

CODE_STEP
const contents = util.getFileContents(input.document);
contents.forEach((fileContents, fileIndex) => {
  fileContents.forEach((content) => {
    console.log(
      `File ${fileIndex + 1}, Page ${content.page}: ${content.content}`
    );
  });
});

File Images Structure

File images are returned as an array of image URLs:

CODE_STEP
const images = util.getFileImages(input.document); // images = ['https://...', 'https://...']

For multiple files:

CODE_STEP
input.document.forEach((file) => {
  const fileImages = util.getFileImages(file);
  console.log('Images for file:', fileImages);
});

Simple Examples

  1. Audio to Lyrics Converter Takes a song audio file and outputs the lyrics.

  2. Video Summarizer Processes a video file and generates a summary.

  3. PDF Question-Answer System Analyzes multiple PDF files to answer questions.

Troubleshooting

Here are some common issues and their solutions:

Parse error when uploading a file

Parse errors during file upload typically occur when there’s a mismatch between the file type and the enabled parsing settings. Here’s what you need to know:

  1. File Type Compatibility:

    • Text/content extraction works with: PDF, Word docs, text files
    • Image extraction works with: PDF, image files (PNG, JPG, etc.)
    • Video/audio files cannot be parsed for text or images
  2. Common Error Scenario: If you’ve enabled content or image extraction in the file field settings (as shown in prerequisites), but upload an incompatible file type like video or audio, you’ll receive a parse error.

To resolve this, ensure your file field settings match the types of files you plan to upload. For video and audio files, extracting content or images will not work.