Structure and Design
Commands
The dataflow has a distinct sequence of steps, so it was logical to break the structure down by these steps. A command object wraps each step
![sequenced-process-commands](/esrgo-documentation/assets/img/sequenced-process-commands.6b761a47.jpg)
Command Pattern Uses - Composable Runtime
Commands allow the runtime steps to be composed externally. The program host is a Console Application, so the internal steps that need to be run can be specified as command line parameters. For example, the transport layer could be FTP or web service (or none if the files were brought to the LOCAL/IN directory by some other external means). The data required might vary by client, so the appropriate set of data stream commands could also be chosen in the command line.
Events
One of the most volatile areas of the application was matching the ESR model to the HealthRoster model. When live client data was processed, we encountered additional requirements for transformations in the data model. We wanted to isolate the code for these transformations, and also provide some activity tracking per process run.
The Event module was responsible for evaluating the incoming data and classifying the events required to process it. Event modelling gave us
- Volatile code isolation
- Correct execution order regardless of data presentation order (e.g. reference data before transactional data)
- A person level audit trail that fits with the user mental model
Event Factory Methods
Events are evaluated by a set of factory methods which have the same signature - staging record parameter and IEnumerable yield of event records. The events to be evaluated depend upon the file format, so the list of factories to be applied is composable in the class calling the event evaluator (List of Func<TIn, TOut>).
Pipeline Architecture
The full snapshot file has 100,000 record which is large enough to justify pipelining the records using the IEnumerable and yield return pattern. Constructing pipeline filter functions is straightforward as long as care is taken to use the yield keyword a all return points.
Dictionary and Hash Set Cache
Pipeline functions process a single record at a time, but some need visibility of a set of records, for example the Implicit Close event evaluator needs to see what has previously processed.
To implement this, we provide a cache of the minimum information information required, which is a hash set cache of the entity keys. This is actually a two-level cache with a Dictionary keyed on Id containing the hash sets, to allow access to key sets or individual keys.
Repositories
Entity Framework is the ORM to the database, and the DbContext object of EF provides a lot of the Repository and UnitOfWork functionality.
Decoupling and mocks
Repositories are used between the process logic and EF to decouple, so that mocks can be used in tests. They also make the database access methods more explicit, and provide encapsulation of caching and other performance enhancements such as SqlBulkCopy.
Aggregate Roots
Since there are 14 different record types in the data source, the repositories are structured around aggregate roots to reduce code clutter, i.e.
- Person related types
- Reference Data types
- Audit data
Facade for Event Processing
Each of these classes has a manageable amount of code, and a feel of structural consistency (cohesion). However, when loading data in from a CSV source, all three repositories are required so there is also an EventRepository that composes these three repositories together. For example, a single Insert method on the EventRepository will choose which sub-repository to send the data to, based on the interfaces implemented by the object.
Non-SQL Data Stores
The DataFixRepository, also used by the EventRepository, retrieves data from a local XML file instead of the database. This application also needs to be ready for use with an Enterprise Service Bus being developed. It is expected that the ReferenceData will be accessed via a Master Data Service on the ESB rather than directly to the database.
Abstractions
There are several interfaces and abstract classes in the the model which enable the event sourcing and related code.
- ITemporal - the basis for temporal extension methods like IsCurrent() and CurrentOrNext()
- IEntityKeyInfo - defines the minimum data for comparing incoming data to repository data and event evaluation. Enables the KeyCache.
- IEventTarget - Allows generic handling of all incoming types via covariance
- PersonActionable and RefDataActionable - enables the RepositorySelector so that the event handlers can pass data off to the correct repository
- PersonAudit - Event store (audit trail) for downstream debugging and support
![interface-hierarchy](/esrgo-documentation/assets/img/interface-hierarchy.c9f94175.png)
Transforms - Matching the Models
At the attribute level the data in the ESR Payroll system and the HealthRoster rostering system are fairly similar, but the business focus of each system is quite different so transformations were required to match the models.
For example, both systems have a temporal versions of records (with start and end dates) but the usage differs, e.g:
- Person records are dated in ESR (both past and future changes) but are not dated in HealthRoster. This means that EsrGo records multiple ESR versions internally, but outputs the a single version to HealthRoster depending on the date of extraction.
- Assignments are dated in both systems, however in ESR end date is not always filled in. In HealthRoster, the end date is mandatory, so the interface must deduce a value. This is done by deducing the end date from the start of the subsequent record.