The Visitor pattern is one of the classical design patterns initially described by the Gang of Four. While you might not need it that often for your own code, chances are high that you will find it in a third-party library you're using (check, for example the Roslyn Compiler).
The pattern can be handy when you have to add functionality to large or complex object structures but don't want to (or can't) edit the types of objects within those structures. The reason why it's not used more often might be that it uses a technique called double-dispatch (more on this later), which can become a bit confusing at first sight.
A small example
We will start with a small example to take a more in-depth look into this pattern.
Let's assume we want to model the employment structure in a software company. We take a very simplified approach and say that our company employs only two types of engineers, juniors, and seniors.
Please also note that this design is only for demonstration purposes and would rarely fit a real-life scenario, But for our example, let's assume it's sufficient enough.
Separation of Concerns
The first task we get from our product owner is to print out the names of all developers working in our company.
We could add this functionality directly to the Developer, but to separate the different concerns, we put it into an external class instead. In that way, our Developer class contains only domain-specific logic (which, I have to admit, is currently relatively sparse).
Since our external class has to visit each `Developer, object to request its name for printing, we will call it PrintingVisitor.
By doing so, we've already successfully implemented the first part of our Visitor pattern: We added new functionality to our existing developers without modifying their classes. This is especially useful if the types we want to add functionality to are part of an external library, and we can't modify or extend their implementation directly.
To finally print our developers' names, we just have to organize them in a simple list and call our visit
method on each object.
const developers: Developer[] = [
new JuniorDeveloper('Bob'),
new SeniorDeveloper('Lisa')
];
const visitor = new PrintingVisitor();
developers.forEach(d => visitor.visit(d));
But we are not finished yet, as there is another problem we have to solve...
Implementing Type-Specific Functionality
We get a new requirement from our PO: This time, the experience level (junior or senior) should be printed beside the developer's name. The result should look something like this:
Bob (junior)
Lisa (senior)
Again, this seems like an easy task. We just have to add some method overloads to our PrintingVisitor, each handling a specific developer subtype.
Note that at compile time, our objects are all of type Developer (the type of our Array from the previous code snippet). However, at runtime, it should be possible to determine the precise type of each instance (either Junior or Senior) and use it to choose the correct overload - or not?
While that sounds good, it does not work that way, at least not for most OOP languages...
No matter what concrete type our variable has at runtime, we always end up with a call to the least specific overload: visit(developer: Developer)
.
The reason is that many languages (Java, C#, TypeScript, Python - to only name a few) implement a so-called single dispatch approach: During a method call, the choice of which method to execute depends only on a single object - the one implementing the operation. Since we have no Visitor class hierarchy, this call will always go to the PrintingVisitor.
The concrete types of the parameter objects will not be resolved during a method dispatch. Instead, only their compile-time type will be used for the correct method lookup. So whether our actual type during execution is Junior Developer or Senior Developer does not affect the final choice.
While this might sound like a substantial limitation, it helps to keep the dispatcher's implementation efficient and transparent. For most cases, this single dispatch approach is also sufficient.
But in our case, it would be really helpful if the correct method would be determined by its receiver and its parameter. And luckily, there is a solution for that - it's called Double Dispatch.
Double Dispatch
If we want the dispatcher to call our correct method overload, we must provide it the proper parameter types - already at compile time.
To achieve this, we can use a small trick:
First, we add a new method, accept
, to our Developer class.
This method will receive our visitor as its argument.
In the method body, we call our Visitor's visit
method, but this time with a reference to self (or this) as input parameter.
And voilà:
Since a self or this pointer is always of the concrete type, the dispatcher can choose the proper method overload at compile time - Jobs done!
If you look at these two invocations, it becomes obvious why this approach is called double-dispatch: We use two method calls to narrow down the types involved in the process:
- First dispatch:
developer.accept(visitor)
and then - Second dispatch:
visitor.visit(this)
What's important: For this double-dispatch to work, we must override the accept
method in every subtype of Developer.
Otherwise, we would still be doing a single dispatch.
See the following diagram for the complete design.
Each subtype of the developer class implements its own accept
method:
If we apply this design to our previous example, it would produce the expected results:
const developers: Developer[] = [
new JuniorDeveloper('Bob'),
new SeniorDeveloper('Lisa')
];
const visitor = new PrintingVisitor();
developers.forEach(d => d.accept(visitor));
// prints:
// Bob (junior)
// Lisa (senior)
This approach may look confusing because it changes the original invocation direction: In our initial design, the visitor called the developer first, but now, the developer calls the visitor instead.
Traversing Internal Object Structures
Adding an accept
method allows the dispatcher to choose the correct method overload.
This additional indirection is a way to implement double-dispatching, but that's not its only benefit:
We can now even traverse complex object structures easily.
In our previous example, our object structure was a flat array of developers, with each element easily accessible. However, in most real-world scenarios, chances are high for us to deal with more complex object structures, like trees or other hierarchical constructs.
Accordingly, let's add a new class, the Department Head, to make our example more realistic. Each head supervises several developers, organized in a subordinates list.
For better encapsulation, the subordinates are private (signaled by the -
in front of the name), making them inaccessible from outside.
Thanks to our accept
method, we still have ways to access each individual developer within the list:
We can override the operation in the Department Head class and let the visitor first visit the head itself. Afterward, we iterate through the subordinates and let each subordinate accept the visitor again.
In that way, we can keep the internals of our object structure private yet let visitors still reach each child element.
Conclusion
That's basically all to know about the Visitor pattern. To sum it up:
- It allows us to add new functionality to existing classes without changing their implementation. The new functionality is implemented by a Visitor class that our object structure has to accept.
- By using a double-dispatch mechanism, we can execute type-specific visiting logic without the need for explicit type casts or reflection/introspection. This might be the most confusing part of the pattern as it turns the invocation direction around. Still, it also makes the Visitor very flexible and is one of its key concepts.
- Using a visitor, we can traverse complex object structures without exposing their internal layout.
Let's now look at some implementation details, as well as potential alternatives to this pattern.
FAQ
This double-dispatching is quite confusing. Isn't there an easier way?
While most OOP languages support only single dispatching, there are some constructs in modern languages to work around this limitation. Let's investigate some of them.
Using runtime type information.
This might be one of the most prominent approaches. Instead of overriding the visit
method, we use a single method and check the concrete type using runtime type information (or reflection/introspection).
Consider the following example implementation:
public void visit(element: Element) {
if(element is ConcreteElementA) {
// visiting logic for ConcreteElementA
} else if(element is ConcreteElementB) {
// visiting logic for ConcreteElementB
} else {
// default case
}
}
This approach and similar ones (e.g., by using pattern matching) don't require double-dispatching, hence making an additional accept
method obsolete.
However, a disadvantage of these techniques is that all logic for all types is now concentrated in a single "visit" method.
This violates the Open-Close principle because it requires us to change our existing code every time a new class is added to the object structure.
Despite that, the accept
method can also contain additional logic (e.g., for traversing) that otherwise has to be placed inside our visitor now.
Nevertheless, these approaches can still be viable, especially for smaller use cases.
Using dynamic dispatching.
Languages like C# allow switching between the default (static) and dynamic runtime (DLR). Using the DLR, the dispatcher dynamically resolves the concrete parameter types at runtime and can find a more suitable method override. This also eliminates the need for an additional accept
method for double dispatching.
To activate dynamic dispatching in C#, all we have to do is declare the parameter of our visit
method as dynamic:
visitor.visit((dynamic)element);
While this approach works, opinions differ on whether this is a good solution. On one side, it reduces the overall code required for the Visitor. Yet, otherwise, dynamic dispatching can slightly affect the performance (see, for instance, this benchmark for a comparison). Also, some people argue that using the DLR in a situation where it's technically not required can harm the readability of your code.
How do I know whether this pattern is a good fit for my concrete use case?
While this depends on several factors, There are some indicators you might consider:
Do I have an object structure that can accept visitors?
This one is relatively obvious:
For double-dispatch to work, you need some kind of accept
method to reverse the calling direction between your objects and the visitors.
If such a method exists or could be added by you, your system may be a candidate for implementing the Visitor.
Otherwise, if you still want to use this pattern, you can also try one of the approaches mentioned earlier that don't require double-dispatching.
Is the type hierarchy of my object structure stable?
If you often find yourself adding new subtypes to the hierarchy you're visiting, using the Visitor pattern can be problematic.
Each new subtype requires updating all existing visitors with an additional visit
method for this new type.
Depending on the number of Visitor classes you have, this can be a considerable implementation effort.
Having a default implementation helps reduce this effort, however, such an implementation is not always possible.
In that case, adding the functionality directly to the object structure might be a better solution.
Can my visitors access all relevant information?
When using the Visitor pattern, you move implementation out of your object structure into your visitors.
That means all data required by your visitors must be publicly available.
Otherwise, you can not externalize this logic - unless your language supports constructs like the friend
keyword in C++ that allows you to access private or protected members.