Self-healing FileMaker WebDirect Routing

A WebDirect deployment can have up to 5 or 10 worker servers where the user’s sessions will live. The main FileMaker Server will route incoming sessions to the healthy workers to balance the load across those workers.

In our Soliant.cloud environment, we can spin up and spin down those workers dynamically based on the total number of WebDirect clients; we are the only FileMaker hosting provider that has this capability.

In recent weeks, we discovered a bug in how Claris implemented that routing. It leads to all routing attempts being stalled when one of the worker servers becomes unresponsive. We are working with Claris to resolve the bug, but in the meantime, we have implemented our own routing based on a thorough understanding of how Claris’ mechanism is supposed to work. It helps, of course, if we have wizards like Karl Jreijiri and Mike Duncan on your team to make figuring this out easy.

With our deployment, any unresponsive worker will remain alive until all of its sessions expire, new incoming sessions are properly routed to the available workers, and a new replacement worker is automatically added.

So, if you want a robust, dynamic, and cost-effective WebDirect deployment, there is only one place to go: soliant.cloud.

4 thoughts on “Self-healing FileMaker WebDirect Routing”

    1. The underlying bug that makes a worker become unresponsive is still there; it’s been around for a while. We’re working with Claris to help track it down.
      In our environment and with the work I’m outlining in this article, it doesn’t have a big impact anymore. We just leave the unresponsive worker be until all active sessions on it end (the symptom is that no new sessions can get on it but running sessions continue to work, albeit with degraded performance). And since we spin up a new worker in the meantime we remain at full capacity.

      1. The issue is that if you have 400 total connections with 5 worker machines, all 80 connections freeze, making users think the entire system has crashed. Even if they log off and back in, causing the remaining workers to become overloaded. This really needs to be addressed, as it creates a poor user experience.

        1. Agreed that the fundamental issue that causes a worker to hang needs fixing.
          As a strategy though, with that many webd connections I would suggest using more workers (Ubuntu supports up to 10). In our Soliant.cloud environment, that overloading of the others doesn’t happen because we automatically spin up a new worker to replace the failing one.

Leave a Comment

Your email address will not be published. Required fields are marked *

GET OUR INSIGHTS DELIVERED

Scroll to Top