Splitting hairs / degrees of freedom Posted on January 12, 2018 at 03:43:50 PM by Tiger
You can slice and dice down to most trivial level, but each time you do, you increase the variance, because the sample size is smaller.
Example: Take Win percentage. It IS a reasonable measure of performance, no doubt, but hardly enough. So now, we separate Singles and Doubles to get a better look, but lo, we now have fewer results in the sample, so the variance increases.
This trend continues as you break it down further: Serve vs Receive, Wednesday vs Thursday - you name it. Each time, you may be getting additional information, but in each case, the statistical reliability becomes less.
My computer systems, from the get-go, have been designed to work exclusively from 'published' data, e.g., entries & results. Now, I get an awful lot of info from those published sources, much more than you would imagine. And, that's not to say I don't add personal (observational) input when doing serious handicapping. But the systems are basically designed to be 'dumb' and 'remote'.
Back to the issue, I use many factors in my analyses, but with some common threads:
1. Assumptions/results should be 'testable' and, if not significant at a target confidence level, can be ignored.
2. There should be an underlying logic (rationale) for use of a specific factor. Odd/even day numbers, for example, is a silly notion. It COULD be tested, but why? Matinee vs Evening, however - you COULD make a case (see Bolivar discussion).