Update to orginal post: After doing some more research, the actual problem turned out to be that the legacy Windows NT service never specified the WindowsPrincipal and had nothing to do with the Gateway assembly. In fact, it turns out that when you load a class using Activator.CreateInstanceFrom(), the class loads into the same parent application domain. I've updated the solution below appropriately.
Thanks to Rick Strahl for clarifying the Activator.CreateInstanceFrom(), app domain behavior. You can also find an excellent article by Rick in CoDe magazine that covers this extensively: http://www.code-magazine.com/article.aspx?quickid=0211081&page=1
- - -
I recently ran into a security problem in a somewhat typical scenario for which the specific details I imagine are fairly esoteric because I was unable to find anything relevant on the web or on MSDN forums. It is amazing how easy it is to forget (or at least suffer from delayed recall) fundamental aspects of the framework when you are in the thick of it and trying to ship a product!
So, I figured I’d post an overview of the scenario and the solution to the problem in hopes of helping anyone else who runs into this.
Scenario
A common approach for setting up any kind of automated process is to use a Windows NT service which is continually running, monitoring a queue or database for records/messages that represent events for wich some work should be performed. For example, I can have a Windows NT Service monitor a private queue every 5 seconds, and when a message is available, process it (whatever process means for the given service).
One option for the processing bit would be to raise and event and delegate the work out to a method that would maintain all of the event handling logic. This handler could reside in a single class within the same assembly as the service, but for obvious reasons this could quickly get unruly and introduce a maintanance nightmare. If every time you need to change the event handling business logic you had to recompile and redeploy the service executable, although you might be building job security, you'd find yourself getting frustrated at best and at worst, you would be tightly coupled to a single assembly making it difficult to leverage this pattern in a more abstract, loosely coupled manner.
One possible solution might be to apply a typical publish and subscribe pattern where the Windows NT service polls a backing store (or ideally was notified automagically) and when a record (or message) meets a specific criteria fire an event. The event would then be mapped to a delegate which implements the handling of the event. However, to keep the implementation of the handler from being static and hard coded, we could further delegate the implementation of the work to an external component, making the original event handling delegate only worry about the plumbing for dispatching the work to another component. Ideally, this would allow a plug and play approach where assemblies could be "dropped" in and just work. Of course, this would require that the message in a queue has some kind of metadata about the component to execute (such as assembly name, type and URI) and when the event is fired, the handler gets access to this URI and dispatches the call to it.
This is precisely what the System.Activator class allow you to. If you know the name of the assembly that contains the component you want to call, simply call the CreateInstanceFrom method on the Activator class. The CLR will then probe the local private directory for the assembly name, load it and instantiate an instance of the type. Figure 1 below summarizes what this might look like.
Figure 1: Mock Sequence Diagram
When the IGateway object is “unwrapped” and made available to the MyNTService Windows NT Service, the service can call a method (or methods) on the instance to do some work. To achieve polymorphism, since the Gateway component that is unwrapped implements the IGateway interface, the service is guaranteed to be able to call a particular method (Execute, in this case).
Figure 2 (below, right) provides a possible object model that supports this design.
The purpose of the Gateway class is to serve as wrapper around a component/service call while maintaining location transparency. This can thought of as a "super proxy" and is really just the classic Service Gateway/Service Agent pattern. The MyNTService Windows NT Service has no idea if the component is local to the machine or process or somewhere out in the cloud. With this flexibility comes significant power in being able to make decisions on distribution in a post-deployment manner.
As the saying goes “with great power comes great responsibility”. Looking again at the sequence diagram in Figure 1, two important things need to be considered.
First, this scenario demonstrates WS-I Profle interop between a WCF service and a managed Windows NT Service running on .NET 1.1 CLR (pretty cool huh?).
Second, between every call in the sequence is an authentication and authorization boundary. Though impractical, the Gateway component could ensure that only the identity of the Windows account running the MyNTService Windows NT Service can call it’s Execute method. More importantly, however, the Gateway should flow the identity of the Windows account running the MyNTService Windows NT Service process to the WCF service that is out in the cloud so that the WCF service can authenticate and authorize the call. Pretty simple,right?
Problem
Well, one of the things that may not be overly apparent is that the assembly in which the Gateway type that implements the IGateway interface resides will actually load in the same application domain that exists within the host NT service process. The MyNTService Windows NT Service knows NOTHING about the Gateway component (other than that it implements a standard interface). When an instance of the Gateway component is activated, we want to run in the security context of the parent process (of the MyNTService Windows NT Service ), however if the host application domain has not specifically set the the Principal Policy, the component will execute with no security context, or to be a bit more technical, will load using the default UnauthenticatedPrincipal enumeration flag. This basically means that the principal under which the MyNTService Windows NT Service is running will not be attached to the thread on which the Gateway component runs. In fact, as the MSDN documentation for the System.Security.Principal.PrincipalPolicy enumeration states “Specifies how principal and identity objects should be created for an application domain. The default is UnauthenticatedPrincipal.”.
Solution
Those keen to the default behavior regarding application domains and Windows Principals already know where this is going, but for those who aren’t sure, stay with me.
Although the MyNTService Windows NT Service was running as a fixed domain account, the WCF service requires transport level authentication and uses role-based authorization to ensure the caller (MyNTService) is authorized. However, calls from the MyNTService Windows NT Service via the Gateway were failing with "The request failed with HTTP status 401: Authorization Required."
I triple checked IIS security settings. My unit tests worked just fine. My test harness also worked and even browsing to the service’s metadata page was working. For some reason, the Gateway was calling the WCF service in a seamingly anonymous manner. This threw me for a loop for a couple of days.
Then, a few days ago, while in the shower (I get many of my “light-bulb on” moments in the shower or in my dreams for some odd reason) the solution hit me like a thunderbolt! The Gateway assembly was being loaded without an authenticated principal, and hence the solution was simple, sweet and elegant:
AppDomain.CurrentDomain.SetPrincipalPolicy(System.Security.Principal.PrincipalPolicy.WindowsPrincipal);
Now the Gateway took on the identity of the Windows AD account that was running the Windows NT Service process and was able to authenticate to and be authorized by the WCF service.
You can find more info on principal policy and application domain behavior here: http://msdn2.microsoft.com/en-us/library/90395801.aspx