Exporting Data
In this chapter we cover the basics of exporting data for research, as well as the used data formatting. Generally, there are two types of data that can be exported from CARE: the inline commentary data including highlight texts, comment texts and metadata, as well as the behavioral user data including interaction with certain UI elements, PDF page scrolling, comment revisioning etc.
Exporting Inline Commentary
How to export
In general, you can export all inline commentary data that
was created on a document that you uploaded (you are the owner)
was created by anyone including the NLP assistance replies
was created in studies or in regular reading mode
was not deleted
was not left in draft mode (never submitted after initial selection)
Note
In the future, the data export feature will be extended to allow the configuration of which type of data should be exported including the export of draft or deleted annotations. If you need such annotations, you would currently need to access the “annotation” and “comment” table of the database.
To realize an export, you have two options:
Access a document, press on the “…” button in the topbar on the right and click “Export annotations”.
In the dashboard Document component, press the button “Export all” at the top right of the documents table.
In case there are no inline commentary on the given documents, no data will be downloaded. In case of an error an error message is prompted to the user.
Data format
The result of an export are one to two .json
files for each documents. The hash in the downloaded file name indicates
the document, which is identical to the document URL. If the document has highlights, the first file ending in
_annotations.json
contains these highlights with associated comments and replies. If the document posses document
comments, i.e. comments unanchored to a span, a second file terminating in _notes.json
is provided containing these
notes with all potential replies.
An example for the structure of the annotation export file looks as follows:
[
{
"text": "<highlighted span>",
"id": 1,
"documentId": 1,
"createdAt": "2023-02-21T15:21:19.802Z",
"updatedAt": "2023-02-21T15:21:21.478Z",
"studySessionId": null,
"userId": "guest",
"tag": "Strength",
"studyId": null,
"comment": {
"id": 1,
"text": null,
"documentId": 1,
"createdAt": "2023-02-21T15:21:19.808Z",
"updatedAt": "2023-02-21T15:21:21.481Z",
"studySessionId": null,
"userId": "guest",
"tags": [],
"annotationId": 1,
"parentCommentId": null,
"studyId": null,
"replies": [
...
]
}
},
...
]
The comment field contains the top-level comment of an annotation, whereas the replies to that comment
are provided as a list under replies
. Each of the replies follows the same structure as the top comment,
hereby allowing for fully reply trees.
Data post-processing
Depending on your use case, you should consider different filtering strategies.
Filtering by study To get all annotations of a specific study, you can simply filter for the studyId:
# data holds list of parsed json objects
# we are interested in the study with id 1
study_annotations = [d for d in data if d["studyId"] == 1]
Filtering by user To get all annotations of a specific user, you can simply filter by the userId field:
# data holds list of parsed json objects
# we are interested in the study with id 1
user_annotations = [d for d in data if d["userId"] == 1]
Exporting Behavioral User Data
Behavioral user data is sensitive and therefore only accessible to administrators. To export this data after logging in as admin to the system, simply access the “User Statistics” view of the dashboard an click on “Export all” on the user table.
The resulting stats export structure is illustrated by the following example:
[
{
"id": 1,
"action": "routeStep",
"data": "{\"from\":\"/\",\"to\":\"/dashboard\"}",
"userId": 1,
"timestamp": "2023-02-23T15:57:08.693Z",
"deleted": false,
"deletedAt": null,
"createdAt": "2023-02-23T15:57:08.693Z",
"updatedAt": "2023-02-23T15:57:08.693Z"
},
{
"id": 7,
"action": "openUploadModal",
"data": "{}",
"userId": 1,
"timestamp": "2023-02-23T15:57:33.065Z",
"deleted": false,
"deletedAt": null,
"createdAt": "2023-02-23T15:57:33.065Z",
"updatedAt": "2023-02-23T15:57:33.065Z"
},
{
"id": 9,
"action": "actionClick",
"data": "{\"action\":\"accessDoc\",\"params\":{...}]}}",
"userId": 1,
"timestamp": "2023-02-23T15:57:48.734Z",
"deleted": false,
"deletedAt": null,
"createdAt": "2023-02-23T15:57:48.734Z",
"updatedAt": "2023-02-23T15:57:48.734Z"
},
{
"id": 10,
"action": "routeStep",
"data": "{\"from\":\"/dashboard/documents\",\"to\":\"/document/...\"}",
"userId": 1,
"timestamp": "2023-02-23T15:57:49.446Z",
"deleted": false,
"deletedAt": null,
"createdAt": "2023-02-23T15:57:49.446Z",
"updatedAt": "2023-02-23T15:57:49.446Z"
},
{
"id": 11,
"action": "pdfPageVisibilityChange",
"data": "{\"documentId\":2,\"readonly\":false,\"visibility\":{\"pageNumber\":1,\"isVisible\":true,\"offset\":17.5},\"studySessionId\":null}",
"userId": 1,
"timestamp": "2023-02-23T15:57:49.854Z",
"deleted": false,
"deletedAt": null,
"createdAt": "2023-02-23T15:57:49.854Z",
"updatedAt": "2023-02-23T15:57:49.854Z"
},
{
"id": 14,
"action": "annotatorScrollActivity",
"data": "{\"documentId\":2,\"scrollTop\":510,\"scrollHeight\":2872}",
"userId": 1,
"timestamp": "2023-02-23T15:57:51.141Z",
"deleted": false,
"deletedAt": null,
"createdAt": "2023-02-23T15:57:51.141Z",
"updatedAt": "2023-02-23T15:57:51.141Z"
},
{
"id": 15,
"action": "pdfPageVisibilityChange",
"data": "{\"documentId\":2,\"readonly\":false,\"visibility\":{\"pageNumber\":2,\"isVisible\":true,\"offset\":1436.5},\"studySessionId\":null}",
"userId": 1,
"timestamp": "2023-02-23T15:57:51.307Z",
"deleted": false,
"deletedAt": null,
"createdAt": "2023-02-23T15:57:51.307Z",
"updatedAt": "2023-02-23T15:57:51.307Z"
},
...
]
Each interaction is associated with a type action
and metadata data
that fully describes the performed user
action at a point in time timestamp
. Based off of these traces, you can infer higher level event sequences and
analyze usage timings.
The most important action types include:
routeStep
: indicating the user navigated to a different route in the app, e.g. accessing a document
openModal
: indicating the user opened a modal of the type specified in the parameters
actionClick
: indicating the user clicked a button in a table with details of the button and table row
pdfPageVisibilityChange
: indicating that a page particular page was rendered or underendered on the user’s screen. This is accompanied with a page number and the vertical offset of the page start to contextualize the next following action type.
annotatorScrollActivity
: indicating that a user scrolled within the PDF (provided in 500ms resolution) with a relative offset in the PDF.