The Predictive Power of Active Share

Active Share is a popular metric that purports to measure portfolio activity. Though Active Share’s fragility and ease of manipulation are increasingly well-understood, there has been no research on its predictive power.

This paper quantifies the predictive power of Active Share and finds that, though Active Share is a statistically significant predictor of the performance difference between portfolio and benchmark, it is a weak one, explaining only approximately 5% of the variation in active management across U.S. equity mutual funds. The predictive power of Active Share is a small fraction of that achieved with robust and predictive equity risk models.

The Breakdown of Active Share

Active Share — the absolute percentage difference between portfolio and benchmark holdings – is a common metric of fund activity. The flaws of this measure are evident from simple examples:

  • If fund with S&P 500 benchmark buys SPXL (S&P 500 Bull 3x ETF), becoming more similar to the benchmark, its Active Share increases.
  • If a fund with S&P 500 benchmark indexes Russell 2000, this passive strategy has 100% Active Share.
  • If a fund F1 differs from the benchmark B in a single 5% position P1 with 20% residual (idiosyncratic, stock-specific) volatility, and F2 differs B in a 10% position P2 with 5% residual volatility F2 has a higher active share, yet is less active.
  • If a fund holds a secondary listing of a benchmark holding, its Active Share increases.

In light of the above flaws, evidence that Active Share funds that outperform may merely index higher-risk benchmarks is unsurprising.

Measuring Active Management

A common defense is that the above and similar examples are pathological or esoteric, unrepresentative of the actual portfolios. Such defense asserts that Active Share measures active management of real-world portfolios.

Astonishingly, we have not seen a single paper assessing whether Active Share has any effectiveness in doing what it is supposed to do – identify which funds are more and which are less active. This paper provides such an assessment.

We consider two metrics of fund activity: Tracking Error and monthly active returns (measured as Mean Absolute Difference between portfolio and benchmark returns). Both of these metrics measure how different the portfolios are in practice. Whether Active Share has value for measuring fund activity depends on whether it can differentiate among more and less active funds.

The study dataset comprises portfolio histories of approximately three thousand U.S. equity mutual funds that are analyzable from regulatory filings. The funds had 2-10 years of history. Our study uses the bootstrapping statistical technique – we select 10,000 samples and perform the following steps for each sample:

  • Select a random fund F and a random date D.
  • Calculate Active Share of F to the S&P 500 ETF (SPY) at D.
  • Keep samples with Active Share between 0 and 0.75 indicating that SPY may be an appropriate benchmark. This step excludes small- and mid-capitalization funds that share no holdings with SPY and would all collapse into a single point with the Active Share of 100, impairing statistical analysis.
  • Measure the activity of F for the following 12 months (period D to D + 12 months). We determine how active a fund is relative to a benchmark by quantifying how similarly to the benchmark it performs.

After the above steps, we have 10,000 observations of fund activity as estimated by Active Share and actual subsequent fund activity.

The Predictive Power of Active Share for Large-Cap U.S. Equity Mutual Funds

The following results quantify the predictive power of active share for differentiating between more and less active U.S. equity mutual funds. For perspective, we also include results on the predictive power of robust equity risk models. These results illustrate the relative weakness of Active Share as a measure of fund activity. They also indicate that, far from mitigating legal risk by reliance of a best practice, the use of Active Share to detect closet indexing may instead create legal risk.

The Predictive Power of Active Share for Forecasting Future Tracking Error

Active Share is a statistically significant metric of fund activity, but a very weak one, predicting only about 5% of the variation in tracking error across mutual funds:

         

         U.S. Equity Mutual Fund Portfolios: The Predictive Power of Active Share for Forecasting Future Tracking Error

Residual standard error: 1.702 on 9998 degrees of freedom
Multiple R-squared:  0.05163,   Adjusted R-squared:  0.05154 
F-statistic: 544.3 on 1 and 9998 DF,  p-value: < 2.2e-16

The Predictive Power of Active Share for Forecasting Future Active Returns

Active Share also predicts only about 5% of the variation in monthly active returns across mutual funds:

            

U.S. Equity Mutual Fund Portfolios: The Predictive Power of Active Share for Forecasting Future Active Return

Residual standard error: 0.3986 on 9998 degrees of freedom
Multiple R-squared:  0.04999,   Adjusted R-squared:  0.04989
F-statistic: 526.1 on 1 and 9998 DF,  p-value: < 2.2e-16

.

The above results make the generous assumption that all relative returns are due to active management. In fact, much relative performance is attributable to passive differences between a portfolio and a benchmark. This complexity will be captured in our follow-up research.

The Predictive Power of Robust Equity Risk Models

To put the predictive power of Active Share into perspective, we compare it to the predictive power of tracking error as estimated by robust and predictive equity risk models. Instead of Active Share, we use our default Statistical U.S. Equity Risk Model to forecast tracking error of a fund F at D.

The Predictive Power of Equity Risk Models for Forecasting Future Tracking Error

The equity risk model predicts approximately 38% of the variation in tracking error across mutual funds:

              

U.S. Equity Mutual Fund Portfolios: The Predictive Power of Robust Equity Risk Models for Forecasting Future Tracking Error

Residual standard error: 1.379 on 9998 degrees of freedom
Multiple R-squared: 0.3776, Adjusted R-squared: 0.3776
F-statistic: 6067 on 1 and 9998 DF, p-value: < 2.2e-16

The Predictive Power of Equity Risk Models for Forecasting Future Active Returns

The equity risk model predicts approximately 44% of the variation in monthly active returns across mutual funds:

   
U.S. Equity Mutual Fund Portfolios: The Predictive Power of Robust Equity Risk Models for Forecasting Future Active Return

Residual standard error: 0.3068 on 9998 degrees of freedom
Multiple R-squared:  0.4375,    Adjusted R-squared:  0.4374
F-statistic:  7776 on 1 and 9998 DF,  p-value: < 2.2e-16

Conclusions

  • Active Share is a statistically significant metric of active management (there is a relationship between Active Share and how active a fund is relative to a given benchmark), yet the predictive power of Active Share is very weak.

  • Active Share predicts only about 5% of the variation in tracking error and active returns across U.S. equity mutual funds.

  • A robust and predictive equity risk model is approximately 7 to 9 times more effective than Active Share, predicting approximately 40% of the variation in tracking error and active returns across U.S. equity mutual funds.

  • In the following articles, we will put the above predictive statistics into context and quantify how likely Active Share is to identify closet indexers.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

The information herein is not represented or warranted to be accurate, correct, complete or timely.
Past performance is no guarantee of future results.
Copyright © 2012-2019,  Alpha Beta Analytics, LLC. All rights reserved.
Content may not be republished without express written consent.

Equity Analytics

Passively-available beta differences with a benchmark are a byproduct, typically unintentional, of any stock-selection process.  Since consistent passive differences, once properly identified, can be freely obtained or offset, they are not part of active contribution.

 Isolating  active performance from the impact of consistent passive differences offers tremendous oversight advantages. 

Unfortunately, current methodologies all fail to properly define passive exposures. As a result current analytics fail to predict future performance, and analytics are only valid to the extent they are predictive.

If risk analytics are valid, they’re predictive.  Analytics that fail to predict future performance are invalid.

We’re offering highly predictive* statistical risk models built to isolate active contributions from passively-available exposures — revealing security-selection skill that persists, true active risk, opportunities to  reduce relative risk without sacrificing active risk, as well as to offset any unintentional bets that may endanger performance.   

 

 

* Over 0.96 median correlation between predicted and subsequent realized returns

FAQs

What’s wrong with FactSet’s attributions and risk estimates and how can you prove it?

Brinson/Active Share fail to consider differences in individual security market and sector Betas, as a consequence, they fail to properly separate active from passive performance. The problem can be demonstrated by observing non-zero correlations between security selection and passive return components.

Security-selection return is defined as a residual (that portion of incremental return unexplained by various passive market exposures) so by definition it is uncorrelated with passive benchmarks or exposures. To the extent security-selection return and passive return calculated by a given system are found to be correlated, the system has failed to properly isolate active contribution and will fail to detect skill and active risk.

A simple example may be how Brinson attribution will deal with a leveraged passive ETF. If SPY returns +10% and a 2x leveraged SPY ETF returns +20%, the Brinson approach will attribute +10% to security selection, instead of the passive market effect.

Returns Based Style analysis estimates average exposure over time and fails for active portfolios in which exposures change through time. The problem with RBS can be demonstrated by comparing current predictions with future performance.

In fact, risk and skill analytics are only valid to the extent they are predictive.  Skilled managers detected by Brinson attributions should tend to outperform in subsequent years, and risks estimated by RBS should reasonably accurately predict future return.

In our tests, we’ve found neither of FactSet’s approaches to be predictive.

 

Why is your approach better and how can you prove it?

Robust estimates of point-in-time betas overcome limitations of Brinson/RBS and result in predictive attributions and risk exposures. This is demonstrated both with the persistence of security selection skill and with the correlation of returns predicted by current exposures with future realized returns.

 

If RBS fails to capture changing exposures, why not just look at rolling regressions?

Regressions provide an estimate of average exposure over a period rather than exposure as of a point-in-time. To the extent exposures change significantly during the period (as with active portfolios) averages may be a poor approximation for any given point-in-time exposure. Both attributions and portfolio risk estimates require point-in-time exposures.

Rolling regressions would simply increase the amount of bad data, for example, consider a manager who ran a 1.5x market beta portfolio last month, but got worried this month, and switched to a 0.5x market beta portfolio. A rolling regression will produce garbage output in such cases.

 

Are your tests time period dependent?  Why should we be confident in your data?

You don’t need to have any confidence in our data. By testing a replicating passive portfolio for a manager, you can see the predictive effectiveness of the models for yourself.

In general, you are right to be skeptical and should only trust risk models and performance analytics that you can test out-of-sample yourself, such as with the replicating portfolio tests that we advise.

 

Why can’t we do the same thing you’re doing ourselves?

Both the Brinson and the RBS analysis in FactSet can be readily replicated in Excel.

Statistical equity risk models, on the other hand, are mathematically complex and require properly translating individual security exposures to portfolio exposures and variances.

Some analytics define risk as of a point in time as the volatility of a portfolio given its holdings and their recent returns until that point in time. Is this estimate accurate?

This is a reasonable approximation of current risk, and by extension VaR, however it does not identify any of the market factors that contribute to current risk  – and knowledge of underlying exposures is critical.

Three main reasons:

  • If you don’t know what the sources of risk are, you don’t know what can be done to make changes and mitigate any problems.  For example, If two portfolios have +10 or -10% statistical exposure to Emerging Markets, the tracking error and VaR will be the same, but the actual risks (and remedies)are the opposite of each other. It’s not terribly helpful to know what current risk is if you don’t know what measures can be taken if that risk is too high or too low.

 

  • Current risk, without knowledge of market exposures, cannot be used for stress testing over different market regimes or historical periods.

 

  • Portfolios with equal risk defined by recent history may have very different underlying exposures, which coincidentally have had the same recent volatility, and those exposures may have completely different long-term risk profiles which would remain hidden with the more simplistic approach.  Hidden exposures to market factors that have had uncharacteristically low recent volatility may seriously underestimate true current risk.  Modeling tail risk based on standard deviations, without quantifying sources of volatility, risks missing the forest for the trees.

 

Do you run your factor-oriented model on holdings, observed historical returns, or both?

Our factor models are built for individual stocks by analyzing observed historical returns. The regression of stock returns on factors calculates stocks’ factor exposures.

These individual stocks’ factor exposures are then aggregated for a portfolio using holdings data to estimate portfolio factor exposures over time.

In summary, the analysis uses both holdings and returns: returns of individual stocks to estimate the factor exposures of stocks and portfolio holdings of individual stocks to estimate the factor exposures of portfolios.

 

Your white paper states that your approach has 0.96 median correlation between predicted ex-ante and reported ex-post portfolio returns. This, I guess, is total return, and not excess return. Also, I assume that you need to know the risk factor realization for the future, and it’s not a pure “prediction”, right? Still, of course, very impressive correlation.

 Yes, 0.96 median correlation between predicted ex-ante and reported ex-post total return. The median correlation between predicted and actual excess return is 0.66

Another way to say this is that 0.96 is the correlation between the replicating passive factor portfolio constructed using the model and the subsequent actual portfolio returns. This replication factor portfolio does not imply any knowledge of the future factor return realizations.

 

The paper shows Apple’s 2.3 sector beta… This surprises me: over what period, and relative to what sector index?

This is beta to the technology sector index, as of 12/30/17, and is based on returns over the previous three years, with a decay factor.

Note that we analyze sector exposure separately from the Market exposure. Your intuition that AAPL has a lower overall risk is correct — its market exposure is ~1. So AAPL has ~1 Market beta and ~2 Technology beta after controlling for Market risk. The ability to measure Market and Technology exposures of AAPL independently, and not assuming that they are equal, is a critical edge of our and other statistical factor models.

Our technology factor is the cap-weighted index of all U.S. Technology stocks. It is materially identical to the Russell 3000 Technology Index. In practice, the Technology Select Sector ETF (XLK) is also a good proxy. We can share out a simplified model illustrating this relationship using AAPL, SPY, and XLK, if it would be helpful.

 

Given your example of Apple stock having a 2.2 beta to the sector, how persistent or how long can you reasonably forecast that stock keeping that beta?

The change in betas over time differs across companies. In the case of APPL, the following is its Sector Exposure (Beta) over time:

The beta changes over several regimes, but remains stable for some time within each regime. It’s interesting to note the change in tech beta in ‘07 when the iPhone was introduced and transformed the company from an idiosyncratic niche player to the driver of the industry’s profits.

What’s equally important, we know that these estimates are unbiased predictors of the subsequent realized betas. So, even as the betas change over time, our estimate at a given point neither over- or under-estimates Market and Sector betas.

 

What is the criteria for the 0.96 correlation?  For example,  is it more binary in nature either the actual hit the exact predicted number or it didn’t?  Or is there still some issue with P-hacking,  having large confidence intervals that make the actual more likely to fall within the predicated range?

This is a Pearson’s correlation (https://en.wikipedia.org/wiki/Pearson_correlation_coefficient) of predicted and actual returns. There is no funny business.

Put differently, our past factor exposures explain about 96% of the variance of subsequent monthly returns.

While it is always wise to be concerned with P-hacking, the above results have held up out-of-sample over several years, thousands of funds, for new institutions, and in new markets.

That said, the best way to prove effectiveness is to take a few sample portfolios, have us replicate them, and check for yourself how well these predictions hold up in the future as well as how they compare to the predictions of other analytics vendors and consultants. The only way to prove the effectiveness of a system or to compare it to that of other systems or processes is to benchmark all out-of-sample and compare the effectiveness of predictions.

Can you explain how the passive ETF replicating portfolio is constructed? Is there a static allocation to a group of various ETFs over time, or do the allocations and types of ETFs used get rebalanced over time?

The particular ETFs used as factors in our risk models are constant (market, sector, style, and bonds for the US model) and all are available passively, which is key. The passive component of incremental return is based on the average exposure (beta) over time (ten years assuming sufficient holding data) to each factor. The timing component is that due to variation in factor exposure, and security-selection is the residual relative to the return calculated by the model. The difference between the model’s calculated return and the portfolio’s actual reported return is also shown as trading/unexplained.

Your question goes to the core definition of active return from stock picking vs. active return from stock picking and factor timing.  You can do either:

1) If you construct a single replicating ETF portfolio and never rebalance it, then the performance of a fund relative to this portfolio would be due to both factor/market timing returns (returns due to variation in systematic risk) and alpha/residual/stock picking returns (idiosyncratic returns unattributable to systematic risk).

2) If you construct/rebalance replicating ETF portfolios periodically to capture variable systematic risk over time, then the performance of a fund relative to this portfolio would be due to alpha/residual/stock picking returns (idiosyncratic returns unattributable to systematic risk).

Over a short period, such as a few months or a few years for low-turnover managers, factor timing returns are immaterial.  The second approach, which we take, isolates stock-picking from timing, also ends up being a larger and more persistent source of active returns for most managers.

A few weeks to a few months of tracking a portfolio against a static replicating ETF portfolio without any rebalancing should be sufficient in most cases to validate the predictive value of our models.