Managing Exception Handling (Event)

Introduction

When a task/document gets processed in a Straatos workflow step, the task may produce an error. In this case, an Error Event is called. This page describes how errors are handled within Straatos.

This Error Handling applies to the follwing steps:

Script Task
Service Task

The error handling does not apply to 'User Task'.

Default Error Behaviour

If no Error Boundary Event is defined, the task remains in the workflow step that caused the error and an Error Message is displayed for the task.

Diagram above shows a snapshot from the Straatos Process Monitor. The 'Processing Step' has one document with an error.

Adding the Error Event

The Error Boundary Event can be added to a Step in the Workflow when an action should be taken when an error occurs. For example the document should be assigned to a user or a notification should be sent.

In order to add an Error Event:

Open the Process Designer

From the left Toolbar select the 'Error Boundary Event' icon

Drag the 'Error Boundary Event' onto the border of the step you would like the error handling to happen

You don't need to set any settings for the Error Boundary Event. The Error event is triggered in the following way:

In Service Task, the event is triggered when an error occurs.

In Scripted Task: When you use the
straatos.setError("your error message");
In User Task, the event will not be triggered.

Just adding the Error Event Handler does not yet take any action. You need to define what action (routing/task) that should be executed when the Error Event is triggered.

Routing based on Error Event

The routing of an when an error event happens follows the same routing as a normal workflow routing of a task.

Example 1: Assign an Error Document to a User

In the above Diagram, if the FTP Export returns an error, the error document is routed to a web validation step for a user to look at the document.

Example 2: Send an Email Notification when an occurs

In the above Diagram, if the 'Processing Step' returns an error, the error document is routed to step that sends an email notification. This could be ideal for unattended tasks. Similarly to sending an email, a scripted task could be used to integrate directly into a Support System such as Zendesk.

Elaborate Error Handling (Example)

This example shows a more elaborate Error handling. In this case, the error handling is intended for a high volume, unattended processing step with a potential high error rate. For example an FTP integration or an integration into a 3rd party webservice.

Objective/Goal from an operations perspective

Auto-Resolve Errors when possible. For example if the external webservice called in 'Processing Step' is temporarily unavailable (network issues, overloaded etc.), the workflow should retry for a number of times.
If the error is permanent, a user should be informed and the user can then route the document back to the task or decide not to process the document.
Send an error notification email to the user, so the user knows something is wrong
If high volumes are processed and all documents fail permanently (for example 5,000 documents), then the user should not receive an email for each document (e.g. not 5,000 emails for 5,000 failed documents). But, for example, only one document every half hour if a additional documents land in the error queue.

Process

The process to start the error handling is kept simple for this example. It consists of a Start Event, the 'Processing Step', and an End Event.The 'Processing Step' has the 'Error Boundary Event'.

The 'Processing Step' is the production step where the external webservice is called in the objective above. In your environment, this will be the step you want to implement the error handling for.

Error Handling

As a first step, I add a Scripted Task 'Error Handling' to the process. In this step, I will check if the document has been looped long enough through the module to consider it a permanent error or if the document should be retried.

The Error Handling has two outcomes. Either the document should be retried or it should be set as permanent errors (which is triggered by an Error Handler.

I will need a workflow field setup called 'ErrorCounter' (as a string) in order to store the number of retries.

The script I use in the error handling looks like this:

// ErrorCounter is a workflow field
if(ErrorCounter !== null) // If the document has already been through the error handling step before, the Error Coutner is not null
{
    //If the ErrorCounter is more than 144 (means the document was retried automatically more than 144 times, then trigger an error
    if(ErrorCounter > 144) 
    {
        straatos.setError("Retry for Document " + _documentId + " failed for " + ErrorCounter + " subsequent retries.");
        ErrorCounter = '0';
    }
    else{
        //If the ErrorCounter is below 144 retries, then increase the error counter by one.
        ErrorCounter = (Number(ErrorCounter) + 1).toString();
    }
    
}

//If the document enters the error handling the first time, then set the Counter to 1
else{
    ErrorCounter = '1';
}

Automatic Retry Delay

In the step above (Error Handling), the retry will happen immediately after the error handling. The delay between to original execution of the 'Processing Step' and the retry of the 'Processing Step' will be only a few milliseconds. This is normally not practical, especially considering the external webservice in our example could be down due to overload, network issues or maintenance.

Hence, the best way will be to delay the retry. In this example, we will delay each retry by 10 minutes.

Add a 'Timer Boundary Event' to the arrow routing the document back to the 'Processing Step'

Click on 'Settings' of the 'Timer Boundary Event' and change the time from 1 day to 10 minutes

Now, when a document hits an error, the document is retried every 10 minutes for a maximum of 144 times. That means the Document is retried for 24 hours before it is marked as a 'Permanent Error'.

It may help to label the timer, so it is easier to understand the process flow. To lable the timer, double click on the timer and enter a text. In this example we use 'Delay 10 min before retry'.

Permanent Error Handling

The document will be routed to a user to take action on a permanent error. The user can view the document and then decide if a document is retried or no longer be processed.

Add a User Task which a document is routed for by the User. In the User Task Settings, enable the Buttons "Complete' and 'Reject'. Rename the labels to 'Terminate' for the Reject Button and 'Retry' for the Complete Button
Add an Exclusive Gateway (decision) after the Permanent Error.
Have one decision to route to an end Event. Click on the line of the terminate route and click settings.

As the condition Field, select _status and in 'Equals' type in 'reject'. This has the effect that when the user in web validation clicks on the reject button, the document is routed along the reject path.
Draw a line from the Decision to the 'Processing Step'. No additional condition needs to be set. This will be the default routing.

Email Notification

So far, the error documents are automatically retried and if the retry fails for more than 24 hours, they are assigned to a user which can then take appropriate action and retry or terminate a document.

The next step is to add an email notification, so that the user responsible for the Permanent Error task will get the information that documents require his/her attention.

Adding an email notification step just before the User Task 'Permanent Error' would notifiy the user when a document enters the Permanent Error. However, the notification would be triggered for each document. Hence if 5,000 documents fail within 5 minutes, then 5,000 email notifications will be sent out.

We will add a notification process that send a new notification maximum every 30 minutes, and only if a new document has entered the 'permanent error' step in the workflow.

Set a workflow Flag if there is a permanent error

Firstly, we add a 'Script Task' step just befor the 'Permanent Error' Step.

Enter the settings for 'Set Error Flag' and enter the following script:

//Use the workflow parameter 'PermError'.
var PermErrorCount = straatos.getWorkflowData('PermError');


if(PermErrorCount !== null || PermErrorCount !== '')
{
   //If the PermErrorCount is not null or not empty, then increase the PermError by one.
    straatos.setWorkflowData('PermError',(Number(PermErrorCount) + 1).toString());
            
}
else{
   //if the PermErrorCount is empty or null, set the PermError to 1
    straatos.setWorkflowData('PermError','1');
}

Tip: The getWorkflowData and setWorkflowData function can be used to set/get workflow variables. The parameter is the name followed by the value to set. Those variables are not linked to a document and stay the same for the entire workflow.

With this step, we now create a workflow variable that is larger than 0 if a document has passed into the 'Permanent Error' step.

Check for Errors every 30 Minutes

The target is now to checck every 30 minutes if at least one new document has been sent to the 'Permanent Error' step.

Add a timer start event
Change the settings of the timer start event to check every hour at 0 and 30 minutes past the hour.
Route the new task to a script step 'Check Permanent Error'

The Timer Start Event will be triggered by Straatos at the time configured. In this example every 30 minutes (specifically on the hour and 30 minutes past the hour). This task is routed to the 'Check Permanent Error'

In the Permanent Error task, we want now to check if the variable 'PermError' is larger than 0. We also want to set a workflow index field to indicate if there have been new documents in 'Permanent Error'

Create a new Workflow Index field 'DocsInPermanentError'
Use the following script in the 'Check Permanent Error' script task:

if(Number(straatos.getWorkflowData('PermError')) > 0){
    //If the PermError is larger than 0, means documents have been added to the Permanent Error stage. In this case
    //set the PermErro variable to 0 (resetting the counter) as we going to send an email notification
    //and set the workflow 'DocsInPermanentError' to 'yes'. This will be used for routing.
    straatos.setWorkflowData('PermError', '0');
    DocsInPermanentError = 'yes';
}
else{
    //If the 'PermError' is 0, then no documents have been added to the 'Permanent Error' stage since the last reset and hence
    //no notification should be sent.
    DocsInPermanentError = 'no';
}

Routing the 'Check Error' Task

Now that the script above as determined if an email notification should be sent, the task can be routed to either send an error email notification or to terminate if now notification should be sent.

Add an Exclusive Gateway (decision) after the 'Check Permanent Error'
Add a routing path to an End Event 'No Errors found' when no email notification should be send
Add a routing path to an 'Service Task' and name the service task to 'Send Email Notification', then route from 'Send Email Notification' to an End Event. Don't forget to configure the Email Notification Task. (use SendGrid and send the appropriate information).
Click on the routing path to the 'Send Email Notification' task and click on Settings. Select 'DocsInPermanentError' as 'Condition Field' and 'Equals' as 'yes'

Final Tips, Tricks and Improvement to the process

The Check error process creates a large amount of tasks in the workflow over time (every 30 minutes one task), hence it makes sense to purge the data in the End Event frequently. The best way to do this is to Enable Purge with a lower number of days. This keeps the number of tasks down.

The error Handling is delayed a fixed 10 minutes per retry. It may make sense to increase the retry time so the system (both Straatos and the called system) are not kept too busy. This could be done in the 'Error Handling' script where the next date and time is determined via a script. E.g. first retry after 10 seconds, then after 20 seconds, 40 seconds, 80 seconds etc.

If the main process (not the error handling) contains more steps, and to keep the main flow easy readable. The error handling could be organised in a subprocess.

To make the error handling process easier to read, as essentially there are two separate processes involved, you can use swimmlanes and comments:

Create your own Knowledge Base