How far away from Six Sigma is the global automated trading infrastructure?

A Six Sigma process is one in which 99.99966% of the products manufactured are statistically expected to be free of defects (3.4 defects per million).  Six Sigma is considered to be world-class quality.  So, the question is:  what would it take to get the automated trading industry to be at the Six Sigma level?

To properly answer this question, we need to decide how to calculate a software error rate. This is a very interesting question since it is based on standard quality-metric calculations: divide the number of errors by the total number of messages to the bid-ask queue.

To calculate this number, we should take the total number of errors divided by the total number of messages.

Let’s assume that the total number of messages is approximately 300,000,000 messages per day per large exchange.  For simplicity sake, let’s also assume there are 5 large exchanges in the world.  So, the total number of messages per year might be approximately 378,000,000,000.

From a cursory scan of news items, it appears that from 8 / 1 / 2011 to 7 / 31 / 2012 there were 31 “possible” problems worldwide.

If we assume that a single error contains 10,000 bad messages, then the total number of errors is about 300,000 messages per year.

The calculation of the software error rate is then 300,000 / 378,000,000,000 = 8.20E-07.

The calculation for a Six Sigma process is 3.4 / 1,000,000 = 3.40E-06.

Therefore, based on the simple assumptions, the quality of the automated trading industry exceeds the Six Sigma level.

My question is then, where did I go wrong in my assumptions?

The second question is, could it be that the problem is not the poor quality of trading industry, but rather the sensationalizing of the errors by the media?

What does a quality management system in AT look like?

For automated trading, a quality management system includes processes to achieve prudent research, design, development, operation, and control of AT systems. This covers critical activities including quantitative modeling, risk control techniques, backtesting, simulated trading, and probationary trading. This also includes processes for and documentation of software and hardware testing that prove the firm has demonstrated that an AT system functions properly, is operationally safe, and robust to behave acceptably during potential extreme events. Statistical methods for evaluating the stability of AT systems and for real-time monitoring have been developed.

What’s the process the firm should go through to do all these things to justify their belief in the stability of the system? The K|V methodology (i.e. a study of methods) is such a methodology (see Quality Money Management, Elsevier/Academic Press, 2008).

AT 9000 is agnostic with respect to research, development, operation, and control methods. Thus, as a study of methods, K|V is not a prescriptive method itself.  Nevertheless, all firms engaged in AT do (or should) engage in the activities described, though not necessarily in sequence of stages and steps shown.  Firms should perform in their own study of methods, and define internal processes that satisfy a quality policy and quality objectives. These processes will be unique to each firm and its organizational environment, and potentially to each AT system R&D project. The intent is not to imply uniformity in the structure of an AT firm’s quality management systems or uniformity of documentation.

The ability of AT firms to prove the stability of their systems also depends upon the availability of execution venue (exchange) simulation facilities to fully test those systems. Such simulation facilities must enable testing against all manner of extreme market and infrastructure events.

By achieving a QMS that follows AT 9000, automated trading firms should be able to satisfy their organizational obligations to prove and document that its AT strategies and technologies will operate safely and profitably. There is also a wide body of literature demonstrating that the use of quality management systems improves financial performance.

What is the AT firm’s organizational responsibility?

The SEC and the CFTC have recently lowered the bar for proving market manipulation from intent to recklessness, implying (in the case of AT, necessarily organizational) imprudence or irresponsibility.  So, in the case of failure of an AT system, how can the organization prove it was responsible, that it was prudent in its AT research and development (R&D) and operation and control (O&C)?  The answer is they were responsible because they followed a recognizably prudent process, one that proved and documented that the firm was justified in believing the future performance (i.e. stability) of its AT system.

AT systems make decisions based on proven research.  As such these systems can only modify the outcomes of these decisions using the structures embedded in the software (i.e. real-time risk control).

How do you know your trading system will work?  What passes as proven research?  The obligation of the AT firm is to prove and document that an AT system’s trading strategy and technology will operate in line with expectations and to specification.  Prudence demands that the firm prove that its systems will run in control.  This obligation can be satisfied by following a prudent process that justifies expectations in the performance of the AT system.

What are the responsibilities of AT firms?

People involved in AT now have both internal responsibilities to their firm and its profitability and external responsibilities to ensure the safe operation of their systems.  What’s problematic is that there are many different, and often competing, views on what the responsibilities are or should be in AT.

AT is an interdisciplinary endeavor requiring the input of traders, computer engineers, and quants.  Each of these disciplines has its own perspective. Traders, for example, often take seriously their principal function and obligation to maintain orderly markets.  Computer engineers have their own codes which require avoidance of unsafe practices and fail-safe design.  (These concepts are most often embedded within the topic of software quality.)  Responsibilities in quantitative analysis revolve around staying within the strategic bounds defined in exchange rules and government regulation and, furthermore, are largely thought to be superseded by adherence to mathematical truth.

Additional perspectives are added to the AT sphere by people and organizations outside the AT firm as well.  The exchanges have their perspectives, and certainly, as do people in different parts of the world.  The following figure shows the perspectives involved in AT:

These perspectives may sometimes be in conflict with each other.  Thus, different AT firms may recognize different responsibilities based upon the internal political dominance of one profession.  No framework exists in AT that considers cross-disciplinary responsibilities of safety to those who might be harmed—external market participants and society.  The new discussion needs to focus on organizational responsibilities.  Likewise, as the global trading network spans multiple AT firms, exchanges and countries, it is important also to consider the industry-wide obligations to create confidence in financial markets and their sustainability. The profitability of any individual firm cannot be more important than the safety of the global trading mechanism.

What are examples of conflicts in AT development and operation?

The need for low latency gives rise to a conflict between speed (necessary for profitability) and the inclusion of fail-safe code that may add latency (necessary for safety of external stakeholders).  An inherent conflict also exists between minimizing costs and satisfying obligations to, for example, paying for research and development of real-time risk controls and/or redundant systems.  As time to market for an AT system matters, production pressure also lead to launch of risky trading systems.  The need for profitable AT systems cannot take precedence over the quality—stability and reliability—of the global system.