Photo by Luke Chesser / Unsplash

Statistical Metrics

main articles Jul 8, 2021

This article contains the explanations of the backtest metrics, used to describe and evaluate the characteristics and the performance of an anomaly or a portfolio of anomalies.


N°Years: the number of years of history of a financial instrument.

N°Trades: the number of trades.

N°Trades 1y: the average number of trades that has been done in a calendar year.

Avg Trade Duration (days): it is the average duration, in calendar days, of a single trade.


Total[R]: it is the total return in percentage.

Total profit: it is the total profit, in dollars, starting from a capital of 100.

Avg [R] annualized: it's the average return of the trades, in percentage, annualized.

Avg [R] 1y: it is the average yearly return of the trades.

Stdev [R] 1y: it's the standard deviation of the yearly returns.

RR 1y: the Reward / Risk or RR, is the ratio between the yearly average return and the standard deviation of the yearly returns

Winning % 1y: it is the percentage of the positive yearly returns.


Avg [R]: it is the average return of the trades.

"Stdev [R]: it is the standard deviation of the trades.

RR: Reward / Risk or RR, is the ratio between the average return and the standard deviation.

Winning %: it is the percentage of winning trades.

PF: the Profit Factor, or PF, is an index of the quality of trading, which evaluates, with a number, the relationship between the risks assumed and the results. It is computed by dividing the sum of the profits by the sum of the losses.

Stability: it's the stability of the equity line of the backtest. It can goes from 0% (min stability) to 100% (max stability). An high stability means that the equity have had a steady linear rise over time.


Sharpe: the 'Sharpe ratio' is a metric to evaluate the risk-adjusted return of the anomaly. It indicates how well the anomaly has performed in comparison to a 'Risk-Free' rate of return. It is computed as the ratio between the anomaly yearly excess return over the risk free (US 3m T-Bill rate) and the standard deviation of the yearly anomaly's returns.

Sortino: the Sortino ratio' is a metric to evaluate the risk-adjusted return of the anomaly. It differs from the Sharpe ratio just in the denominator: it only considers the standard deviation of the downside risk, rather than that of the entire (upside + downside) risk.

Avg Dwn: the average drawdown of the equity line of the backtest. The lower is the average drawdown, the closer the equity line has been to its all-time highs.

Max Dwn: it is the maximum drawdown of the equity line.

Max Dwn / Avg [R] 1y: the ratio between the maximum drawdown of the anomaly and its average yearly return. It expresses how many years it could take to recover from a drawdown equal to the maximum historical drawdown.

Z-Score streaks: the Z-Score streaks measures how it is likely that our streaks of trades (consecutive wins and consecutive loss) are random or not. It fluctuates between -3 to +3, but sometimes, can go above and below these levels. A Z-score value of 0 means that we are dealing with completely random results. A positive Z-score means that a profitable position is likely to be followed by a losing one, while a losing one should probably be followed by a winning one, so the probability of long winning and losing streaks is low. Instead, a negative Z-score means that a profitable position is likely to be followed by more profitable positions, and a losing position uses to be followed by more losing positions, it means that winning or losing streaks are probable.
E.g., if the last trade were a winning one, we can expect that the following one will be: (1) if Z-Score is near to +3 ==> losing (2) if Z-Score is near to 0 ==> 50% losing, 50% winning (3) if Z-Score is near to -3 ==> winning.

C-VaR: "the CVaR (or 'Conditional Value at Risk' or 'expected shortfall') is the average of the worst centile of the daily returns. In other words, it is a measure of risk since it is an average of the worst daily historical returns. CVaR is derived by taking the average of the “extreme” losses in the tail of the distribution of possible returns, beyond the value at risk (VaR) cutoff point. C-VaR is used in portfolio optimization for effective risk management.


Excess Metrics (anomalies)

Exc. Avg [R] ann.: it is the difference between the 'average gross return annualized' of the anomaly and the 'other trades' one.

Exc. Avg [R]: it is the difference between the 'average gross return' of the anomaly and the 'other trades' one.

Exc. RR: it's the difference between the 'reward / risk' of the anomaly and the 'other trades' one.

Exc. Winning %: it is the difference between the 'positive percentage' of the anomaly and the 'other trades' one.

Note that:

  • Excess metrics are computed on 'other trades'.
  • The 'other trades' are trades done in the same instrument but in a different periods than the anomaly one.
  • For example, the other trades of "Apple TDW 1" are "Apple TDW 2,3,4,5".
  • The 'other trades' are duration-adjusted according to the average trade duration of the anomaly. E.g. in the example, the "Apple TDW 2,3,4,5" returns are divided by 4, to become comparable to "Apple TDW 1" returns.

Excess Metrics (portfolios)

Exc. Avg [R] 1y on Bench.: it's the average yearly return of the portfolio minus the benchmark one.

Exc. Avg [R] 1y on RF: it is the average yearly return of the portfolio minus the Risk Free one.

Exc. RR 1y on Bench.: the yearly Reward / Risk of the portfolio minus the benchmark one.

Winning % 1y on Bench.: the percentage of years in which the return of the portfolio has been higher than the benchmark one.

Winning % 1y on RF: it's the percentage of years in which the return of the portfolio has been higher than the Risk Free one.

Note that:

  • Excess metrics are computed 'on benchmark' or 'on Risk-free rate'
  • The 'benchmark' is the S&P 500.
  • The 'Risk-free rate' is the annualized US 3 months T-Bill rate.

Score and Rating

Score: The 'Rating' is derived from the 'Score'. The p-value 'Score' is the inverse of the p-value (score = 1 / p-value) get from a statistical test, different according to the distribution of returns.

There are two types of scores:

  • 'Score on zero': (for 'anomalies' and 'portfolios') measures how much a set of returns is significantly different from a set of returns with zero-mean.
  • 'Score on others': (for 'anomalies' only) measures how much a set of returns is significantly different with another set of returns.

Rating: the 'Rating' goes from 0 to 5. The higher the rating, the more the anomaly returns are statistically significantly different from zero. It is derived from the Score, and the Rating is given according to 'Score' clusters.

Rating FC: it's a variation of the Rating, and it is computed as the weighted average of three different 'Ratings' over the same period (IS, OS or ALL); its formula is: 0.4 * 'Rating Net' + 0.4 * 'Rating on Others' + 0.2 * 'Rating Gross'.

Rating FC summary: it is a rating derived from the 'Rating FC - all period' and, in addition, gives a bonus to the anomalies that were able to keep the good In-Sample performance also in the Out-of-Sample period; while vice-versa it gives a malus.

Tags

Andrea Ferrari

I deal with programming and finance, I lead the research and development of ForecastCycles. I strongly believe in seasonal analysis for financial markets