Our Tools

Summary

As part of building systems for our clients, we frequently develop reusable techniques and tool-like software components to increase our productivity, improve system reliability and transparency, and ease  the training and handover process. Most of these tools are independent of the business particulars of the application and can generally be used as ready-made, instead of implementing all of their features de novo in each system. These tools are mature and have been successfully used by us in the numerous systems we have built.

Database gateway

The database gateway is responsible for coordinating, regulating and streamlining the interactions between the non-database and database components of the system. Specifically, it acts as a single point of entry into the database; supports error handling and recording for both system and business errors; supports collection, reporting and analysis of database usage and performance statistics; supports transactional atomicity and concurrency control; allows for establishment of sophisticated data access and operation authorization regimes; and finally standardizes how the data is exchanged between the non-database and database components. The database gateway has been used successfully in the numerous applications we have built.

Workflow automation

By the nature out clients’ businesses, some of the systems we build have very large computation processes that involve many thousands of distinct interdependent tasks, authored by multiple groups of analysts and users who normally are not in synch with each other, at least during the early stages of processing. Figuring out the correct sequence of execution of these tasks, and the overall synchronization state of the system, is an activity not well suited to manual control. We have developed a system that takes in information about computation tasks and their interdependencies, builds the dependency graph, and then manages the overall computation process and reports on the system synchronization state. It collects performance statistics of executions and provides estimates on pending tasks.

CLR Library

While working with Microsoft SQL Server, we do most of our programming in Transact SQL; however, sometimes it is just too awkward or too slow for the task (especially when it comes to user-defined functions.) To compensate, we have developed an extensive CLR library of C# procedures and functions, covering: lists-to-tables and back transformations; regular expression-based pattern matching and replacing; period arithmetic in support of time-series computation; and validation, transformation and formatting of values of different types. We use this library in many systems we build, further extending our ability to do even more processing in the database layer.

SQL wrapper for X-12/13-ARIMA

A significant amount of business data typically needs to be seasonally adjusted. The industry standard for this problem is the X-12/13-ARIMA program from the US Bureau of Census. Since all of the data processing and analysis in our systems is done in SQL, we needed a convenient way to call this program directly from SQL stored procedures. Thus, we have developed the X-12/13-ARIMA wrapper in C# that is directly callable from MS SQL Server. It supports metadata-defined flexible extraction of both structured and unstructured output of the ARIMA program, where unstructured output is extracted using layered regular expressions. We have also observed that our wrapper performs about 5 times faster than the natively provided SAS wrapper.

Metadata Loader

The simplest way for analysts to create and manage metadata is to use MS Excel. We have developed templates and methods of work for specifying metadata in MS Excel, and a tool to transform MS Excel sheets into tables. This tool first converts an MS Excel document into canonical XML, uploads this XML into the database, parses it, and finally creates database tables corresponding to individual MS Excel sheets. Importantly, this tool allows analysts to work on metadata in MS Excel where they feel most comfortable, and for the database system to operate on it as tables without the need for any manual intervention.

Aggregators

Aggregations are perhaps the most common form of computation encountered in business systems. We have developed a variety of metadata-driven aggregators that operate on multiple dimensions. All specifications for these aggregators: which dimensions are present; which taxonomies exist, their membership and inter-element relationships; how these taxonomies change over time; which taxonomies are used for which dimensions; which aggregation steps involve which subsets of dimension; the dependencies among the aggregation steps; as well as some other information are represented in metadata. Our aggregators support a “completeness check” (to see if all children of a given aggregate have values) and direct specification of values for aggregates that fail to compute naturally; they also have mechanisms to control computational explosions that are common in multi-dimensional situations. The aggregation engines are written in Transact SQL and are essentially applications of several breadth-first graph processing algorithms – some based on level-by-level graph rolls, and others based on pre-computed partial and full closures.

Audits and drilldowns

The ability to explain how various results are computed is essential in financial systems – yet, very difficult to achieve over millions of data points. We have developed a set of techniques and corresponding code modules that are used to support auditability and drilldowns in the systems we build. Our solutions can effectively handle very large graphs supporting a variety of traversals, yet allow one to concentrate on small areas at a time.

Screener

One of the important capabilities in dealing with large result sets is the ability to quickly find areas of interest. We have developed a screener tool that can find outliers relative to a particular locale within the aggregation graph, and present them to analysts within their taxonomical context.

Table and story builder

Most of the systems we build deal with multi-dimensional data that needs to be displayed in tables with complex row and column configurations, including multi-level column headers and row stubs, and demanding display/style characteristics. We have developed a system for creating such tables where the table structure and aesthetics are completely defined by metadata, and can be further refined at run-time; also, the content of even a single table can be sourced from multiple disparate data providers. When used for screen-based displays, our tables also support click-based pass-through to computation audits and drill-downs. We support multiple output formats, all based on XML – one of which is HTML-compliant. We have also developed a super-structure on top of our tabling engine that allows one to create “stories,” or hierarchical arrangements of tables. Besides publishing, we have also used these stories to completely automate the creation of front-end screens used for data entry and analysis. Our tabling engine is built entirely in SQL and has been used in numerous systems to define many thousands of tables.

Balance of payments rule language

Balance of payments rule language (BPRL) was developed to support the needs of calculating detailed accounts. It allows analysts to specify computation formulas in an intuitive, MS Excel-like notation, and then evaluates them over the data in the database. BPRL supports: full arithmetic (+, -, * and /); binary comparators (=, !=, <, etc.) and complex parenthesized conditions (NOT, AND and OR), that also work correctly in the presence of NULLs (i.e., 3-valued logic); aggregate and scalar functions; the ability to conditionally reconfigure formulas (i.e., to include/exclude portions of the formula) and to evaluate them (keep or NULL the results); and various additional features. The BPRL engine translates modules written in BPRL into labeled directed acyclic graph structures, capturing formula interdependencies, loads them into the database tables, and then acts as an interpreter over these formulas. Importantly, BPRL supports full auditability over its computations. Except for a small component that translates the BPRL rule files into XML representation, the rest of BPRL system is written entirely in Transact SQL.

Integrated validation, auto-edit and search tool

The integrated validation, auto-edit and search tool (IVEST) is a system that allows analysts to specify a variety of criteria for searching through survey data, validating it and correcting and/or enhancing it when certain errors are discovered or conditions are met. The rule language supported by IVEST has a natural user-friendly syntax, yet is expressive enough to essentially allow for any conditions normally expressible within SQL. The IVEST execution engine is implemented in Transact SQL (using some CLR functions), and is essentially a code-generator translating IVEST rules into efficient SQL queries.

Secondary suppression

Secondary suppression is used in supporting non-disclosure of data cells in a multi-dimensional table space – a notoriously difficult problem. We have developed a suite of tools to assist with secondary suppression; the three most significant of these are: a tool for analyzing a current state of suppression, including reporting on “broken” cells; a tool for choosing candidates (based on a variety of criteria) for additional suppression necessary to protect currently suppressed cells; and a suppression audit tool for showing why certain suppressions were chosen by the system and the various dependencies that exist among suppressed cells. Our suppression tools are both periodicity and history aware – i.e., suppressions are coordinated between annual and quarterly data, as well as with prior vintages/revisions. They also allow for analyst overrides (both positive and negative) of system selected suppressions, and support an iterative collaborative process between the analysts and the system in establishing final secondary suppression pattern.

Security

Many of the systems we build house sensitive data that needs to be accessed and manipulated by many people, yet whose data security is paramount. The primary responsibility for security in our systems lies with the database gateway – it acts as a single point of entry into the database; provides means for establishing sophisticated, fine-granularity data access and operation authorization schemes; and tracks all data-related activities in the system. It provides an effective means for securing our applications; yet, it is not computationally heavy or burdensome to administer. All of the tools that we have built – workflow automation, audit, drilldown, screener, tabling, etc. – are also security-aware.