Applying NLP Skills to Documents

CARE allows administrators to apply NLP skills to files (currently, documents and submissions) as batches with same configurations. This feature enables preprocessing of large file collections with configured NLP models and automatic storage of results for further analysis.

When to Use Skill Application

Use the skill application feature to:

Preprocess submissions with NLP models before study participants begin, to avoid latency during interaction
Validate NLP model outputs for different inputs and scenarios
Prepare ground truth data for model evaluation studies
Generate baseline predictions for human-in-the-loop workflows (eg., Peer Review Workflow with AI)
If annotations for documents are created, pre-generate NLP results to compare human vs. model annotations
Comparison studies with and without NLP support can be facilitated using the same platform, avoiding external factors affecting separate setups

Accessing Skill Application

Skill application is available in the Dashboard under the Submissions component.

Note

Only administrators can access the selection of skill application, displays NLP skills that are online only, if there are no online skills then the fallback will be activated. Regular users cannot initiate or monitor preprocessing tasks.

To open the skill application interface:

Log in as an administrator
Navigate to the Dashboard
In the left sidebar, click Submissions
Click on Apply Skills

The Apply Skills modal will open with a step-by-step wizard.

Configuration Steps

Step 1: Select Skill

Choose an NLP skill from the dropdown. The available skills are shown with their status (online/offline) . To understand what a skill does before selecting it, check the NLP Skills component in the dashboard, which shows:

Input parameters and expected input formats
Output fields and their meanings
Full configuration details

Step 2: Map Parameters to Data Sources

For each skill input parameter, select a data source:

Submission: All documents from a submission (for multi-document analysis saved as a submission)
Document: A specific individual document
Configuration: A configuration of type assessment

..note:

You can select only one table-based data source(submission or document) in total for all parameters.

Step 3: Select Files to Process

Choose which specific files (submissions or documents) to apply the skill to. The selection interface shows available files grouped by validation configuration. You must select at least one file.

Step 4: Select Base Files (submission-based parameter selections only)

If the skill operates on submissions (not individual documents), you must specify which document type to save results to, since submissions contain multiple documents.

For each group ( groups are based on shared validation configuration ), choose the target document type for that specific submission-group (pdf, html, modal, etc.).

Note

This step only appears for submission-based skills. Document-based skills save directly to the selected document.

Step 5: Review and Confirm

Review the configuration summary showing:

Selected skill and parameters
Number of files selected
Total number of requests to be created

Click Apply Skills to submit.

Monitoring and Results

During Processing:

After submission, the modal shows real-time progress:

Progress bar (X / Y requests completed)
Current request elapsed time
Estimated time remaining
Queue of submissions still waiting

You can close the modal; processing continues in the background. Re-open “Apply Skills” anytime to check progress.

Cancelling Pre-processing:

Click Cancel Preprocess to stop remaining requests. Already-completed requests remain saved in the database.

Reviewing Results:

Results are automatically saved to documents in the document_data table with keys like:

nlpRequest_grading_expose_assessment

Access results by:

Looking at the document_data table for the corresponding combination of documentId, studySessionId, studyStepId and key. If this is integrated into a study, results can be accessed via the component’s data sources while configuring the study.

Error Handling

If processing fails for specific items:

Error is logged to backend; processing continues with next item
Check backend logs for details

Common issues:

Files missing: File deleted from disk or database
Processing latency: Skill took too long; check the cancellation view or backend logs for details
NLP timeout: Skill took too long; increase timeout in settings
Skill offline: Check NLP Skills component for status
No results stored: Verify base file selection was correct

Best Practices

Correct selections: Ensure the skill name, input parameters are mapped to correct data sources, correct files and base files are chosen appropriately
Start small: Test with 1-5 submissions first to verify skill configuration and its functionality before running large batches
Review results: Always manually check initial results to assess model quality and correct functionality in CARE
Monitor time: Watch elapsed time; if requests exceeds the expected duration, something may be wrong
Document your runs: Note which skills ran on which submissions and when, for reproducibility
Check permissions: Only admins can run skill application; results are stored under your userId

Troubleshooting

No skills appear in dropdown

Skill Broker is either offline or no models connected (fallbacks are visible).
Check NLP Skills component.

Can’t proceed past Step 1

You must map at least one parameter to a table-based source (submission or document).

Error during processing

Make sure, that the inputs selected match the requirement for the skill.
Check backend logs for specific error messages related to the skill or data.

Processing takes very long

Skill may be slow or offline.
Check cancellation view, backend logs or the broker for errors. Cancel and retry if needed.

Results don’t appear in document_data

Verify base file selection was correct.
Check that the documentId exists in the database.

Results look incorrect

Test the skill manually using the NLP Skills message interface. Compare against expected output.