Our Tools
Summary
As part of building systems for our clients, we frequently develop reusable techniques and tool-like software components to increase our productivity, improve system reliability and transparency, and ease the training and handover process. Most of these tools are independent of the business particulars of the application and can generally be used as ready-made, instead of implementing all of their features de novo in each system. These tools are mature and have been successfully used by us in the numerous systems we have built.
Database Gateway
Our systems include large computational components built in SQL in the database layer. The database gateway encapsulates these systems and is responsible for coordinating, regulating, and streamlining the interactions between the non-database and database system layers. Specifically, it includes the following capabilities:
acts as a single point of entry into the database
supports error handling and recording for both system and business errors
supports collection, reporting and analysis of application usage and performance statistics
supports transactional atomicity and concurrency control
allows for establishment of sophisticated data access and operation authorization regimes
standardizes how the data is exchanged between the non-database and database layers
Metadata Loader
The simplest way for analysts to create and manage metadata is to use Microsoft Excel spreadsheet. We have developed templates and methods of work for specifying metadata in Excel, and an ability to convert the spreadsheet into canonical XML and/or JSON and upload the result into the database populating database tables corresponding to individual Excel sheets.
The tool provides a preview of metadata changes and supports a variety of metadata validation checks. Importantly, this tool allows analysts to work on metadata in Excel where they feel most comfortable, and for the database system to operate on the metadata as tables without the need for any manual intervention.
Aggregators
Aggregation is perhaps the most common form of computation encountered in business systems. We have developed a variety of metadata-driven aggregators that operate on multiple dimensions. All specifications for these aggregators are represented in metadata, including:
the present dimensions
the existing taxonomies, and their membership and inter-element relationships
how these taxonomies change over time
the mappings of dimensions to taxonomies
the dimensionality of individual aggregation steps
the dependencies among the aggregation steps
Our aggregators support an optional completeness verification to see if all children of a given aggregate have values, and direct specification of values for aggregates that fail to compute naturally. They also have mechanisms to control computational explosions that are common in multi-dimensional situations. The aggregation engines are written in Transact SQL and are essentially applications of several breadth-first graph processing algorithms – some based on level-by-level graph rolls, and others based on pre-computed partial and full closures.
Audits and drilldowns
The ability to explain how various results are actually computed is essential in statistical and financial systems – yet, very difficult to achieve over billions of data points. We have developed a set of techniques and corresponding code modules that are used to support auditability and drilldowns in the systems we build. Our solutions can effectively handle very large computational graphs supporting a variety of traversals, yet allow one to concentrate on small areas at a time.
Table and story builder
Most of the systems we build deal with multi-dimensional data that needs to be displayed in tables with complex row and column configurations, including multi-level column headers and row stubs, and demanding display/style characteristics. We have developed a system for creating such tables where the table structure and aesthetics are completely defined by metadata, and can be further refined at run-time, and where the content of even a single table can be sourced from multiple disparate data providers.
When used for screen-based displays, our tables also support click-through to computation audits and drilldowns. We support multiple output formats, all based on XML – including HTML-compliant versions suitable for browsers. We have also developed a superstructure on top of our tabling engine that allows one to create stories – hierarchical arrangements of tables.
In addition to publishing, we have also used these stories to completely automate the creation of front-end screens used for data entry and analysis. Our tabling engine is built entirely in SQL and has been used in numerous systems to define many thousands of tables.
Inquiry Wizard
While analysts often need comprehensive access to detailed data - providing direct, unfettered database access to non-programmers is not without certain risks. To address this, we have developed an Inquiry Wizard (INW) utility – a dynamic, user-friendly, graphical facility for posing common questions.
INW allows business specialists to develop SQL stored procedures to pose such common questions, heavily parametrize them, and include them in INW. This then allows non-technical analysts to execute the inquiries against the database. We have also added capability to INW that allows users to interact with the inquiry responses via clicks or edits, thus making INW a platform for building interactive dashboards.
The inquiries are described by extensive metadata that allows INW to guide users in eliciting the needed parameters (which can be interdependent), provide proper security and access control, and save results in a variety of ways. We have successfully deployed INW in numerous systems.
Workflow automation
By the nature of our clients’ businesses, some of the systems we build have large computation processes that involve thousands of distinct interdependent tasks authored by multiple groups of analysts and users. Identifying the correct sequence of execution of these tasks and the overall synchronization state of the system is a non-trivial and laborious process.
We have developed a system that ingests information (metadata) about computation tasks and their interdependencies, builds the dependency graph, serializes it, and then manages the overall computation process, and reports on the system synchronization state. Significantly, the system incorporates the ability to identify only the subset of tasks that need to be incrementally recomputed to fully re-synchronize the system after a change. It also collects performance statistics of task executions and provides estimates on pending tasks.
IFACT
Integrated formula analyses and calculation tool (IFACT) was developed to support the needs of calculating detailed accounts. It allows analysts to specify computational formulas in an intuitive MS Excel-like notation, translates them into a labeled directed acyclic graph (capturing formula interdependencies), loads the graph into database tables, and then acts as an interpreter evaluating this formula graph over the data in the database, using techniques specifically optimized for SQL.
IFACT has significant expressive power covering: formulas over variable multi-dimensional spaces; full arithmetic (+, -, * and /); binary comparators (=, !=, <, etc.); complex parenthesized conditions (NOT, AND and OR) that also work correctly in the presence of NULLs (i.e., 3-valued logic); scalar and aggregate functions; ability to conditionally reconfigure formulas (i.e., to include/exclude portions of the formula) and to evaluate them (keep or NULL the results); and formula preferencing. Importantly, IFACT supports full auditability over its computations.
IVEST
The integrated validation, auto-edit and search tool (IVEST) is a system that allows analysts to specify as metadata, a variety of criteria for searching through survey data, validating it, and correcting and/or enhancing it when certain errors are discovered or conditions are met. The rule language supported by IVEST has a natural user-friendly syntax, yet is expressive enough to essentially allow for any conditions normally expressible within SQL. The IVEST execution engine is implemented in Transact SQL (using some CLR functions) and is essentially a code-generator translating IVEST rules into efficient SQL queries.
Screener
One of the important capabilities in dealing with large result sets is the ability to quickly find areas of interest. We have developed a screener tool that can find outliers relative to a particular locale within the aggregation graph and present them to analysts within their taxonomical context.
Secondary Suppression
Secondary suppression is used in supporting non-disclosure of data cells in a multi-dimensional table space – a notoriously difficult problem. We have developed a tool to assist with secondary suppression, including:
analysis of a current state of suppression, including reporting on “broken” cells
selection of candidates (based on a variety of criteria) for additional suppression necessary to protect currently suppressed cells
suppression audit capability for showing why certain suppressions were chosen by the system and the various dependencies that exist among suppressed cells
Our suppression tool is both periodicity and history aware – i.e., suppressions are coordinated between annual and quarterly data, as well as with prior vintages/revisions. The tool also allows for analyst overrides (both positive and negative) of system selected suppressions and supports an iterative collaborative process between the analysts and the system in establishing final secondary suppressions.
CLR Library
While working with Microsoft SQL Server, we do most of our programming in Transact SQL; however, sometimes Transact SQL is just too awkward or too slow for the task. To compensate, we have developed an extensive CLR library of C# procedures and functions, including regular expression-based pattern matching and replacing, period arithmetic in support of time-series computation, and validation, transformation and formatting of values of various data types. We use this library in many systems we build, further extending our ability to do even more processing in the database layer.
SQL wrapper for X-12/13-ARIMA
A significant amount of business data typically needs to be seasonally adjusted. The industry standard for this problem is the X-12/13-ARIMA program from the US Bureau of Census. Since all data processing and analysis in our systems is done in SQL, we needed a convenient way to call this program directly from SQL stored procedures. Thus, we have developed the X-12/13-ARIMA wrapper in C# that is directly callable from MS SQL Server. It supports metadata-defined flexible extraction of both structured and unstructured output of the ARIMA program, where unstructured output is extracted using layered regular expressions. We have also observed that our wrapper performs about 5 times faster than the natively provided SAS wrapper.
Supplementing SQL with C#, Python and R
The systems Omnicom builds typically have most of their business logic implemented in the database layer. However, we sometimes encounter computations where SQL is simply not the best language for the task – both in terms of clarity and efficiency. In such cases we have employed other languages – initially C#, and more recently Python and R – to supplement SQL capabilities. Of note, the modules written in these languages are executed on the same database server and are under operational control of the SQL stored procedures. We have developed standard protocols allowing these non-SQL components to contribute error information and metering to the calling stored procedures to facilitate an integrated view of the entire system’s operation. We have successfully implemented an advanced version of Denton interpolator, a balancing engine, data imputations, as well as several data visualization tools using this approach.
Security
Many of the systems we build house sensitive data that needs to be accessed and manipulated by many people, yet whose data security is paramount. The primary responsibility for security in our systems lies with the database gateway – it acts as a single point of entry into the database; provides means for establishing sophisticated, fine-granularity data access and operation authorization schemes; and tracks all data-related activities in the system. It provides an effective means for securing our applications; yet it is not computationally heavy or burdensome to administer. All the tools that we have built – workflow automation, audit, drilldown, screener, tabling, etc. – are also security-aware.