Statistics

 comment under construction, but the links work

Only one quarter of all pairs have been calculated, but we already see something that needs looking into: the large differences of solutions per pair. October 6 has only 7 solutions, while June 7 has 191.
This is where statistics come in: the average is 70 and the standard deviation is 41. A standard deviation that is more than half of the average is a sign something strange going on.

Can we find an explanation? Or see a pattern? And then: a pattern of what?
The first attempt: does it make a difference where the 2 squares of the pair are located? Especially: where it is located with respect to sides.

There are 2 extremes:
- In the corner touching 2 sides
- In the middle of the board, far away from both sides
Most fields are in between these 2 extremes.

The good thing is, that there is a number indicating how far the points of a pair are away from the sides: the number of filtered positions.
If is going to be explained on the page Definitions, but since that is not yet the case there is now a small introduction here. Here is the situation for the rectangle, later to be called form 1

If you are looking for a date in January the following 2 positions are filtered out, i.e. they are not available in any solution. You clearly see the 2 orientations:

Jan Feb Mar Apr May Jun xxx
Jul Aug Sep Oct Nov Dec xxx
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 xxx xxx xxx xxx
Jan Feb Mar Apr May Jun xxx
Jul Aug Sep Oct Nov Dec xxx
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 xxx xxx xxx xxx

The other extreme is 11, the square in the middle of the board. Here there are 12 solutions, 6 for each orientation:

Jan Feb Mar Apr May Jun xxx
Jul Aug Sep Oct Nov Dec xxx
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 xxx xxx xxx xxx
Jan Feb Mar Apr May Jun xxx
Jul Aug Sep Oct Nov Dec xxx
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 xxx xxx xxx xxx
Jan Feb Mar Apr May Jun xxx
Jul Aug Sep Oct Nov Dec xxx
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 xxx xxx xxx xxx
Jan Feb Mar Apr May Jun xxx
Jul Aug Sep Oct Nov Dec xxx
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 xxx xxx xxx xxx
Jan Feb Mar Apr May Jun xxx
Jul Aug Sep Oct Nov Dec xxx
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 xxx xxx xxx xxx
Jan Feb Mar Apr May Jun xxx
Jul Aug Sep Oct Nov Dec xxx
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 xxx xxx xxx xxx
Jan Feb Mar Apr May Jun xxx
Jul Aug Sep Oct Nov Dec xxx
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 xxx xxx xxx xxx
Jan Feb Mar Apr May Jun xxx
Jul Aug Sep Oct Nov Dec xxx
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 xxx xxx xxx xxx
Jan Feb Mar Apr May Jun xxx
Jul Aug Sep Oct Nov Dec xxx
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 xxx xxx xxx xxx
Jan Feb Mar Apr May Jun xxx
Jul Aug Sep Oct Nov Dec xxx
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 xxx xxx xxx xxx
Jan Feb Mar Apr May Jun xxx
Jul Aug Sep Oct Nov Dec xxx
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 xxx xxx xxx xxx
Jan Feb Mar Apr May Jun xxx
Jul Aug Sep Oct Nov Dec xxx
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 xxx xxx xxx xxx

Other forms have roughly the same behaviour. Although things may be a bit more complicated with irregular shaped forms, the bottom-line is the same: squares in the middle filter more positions than squares in the corners.

When dealing with pairs, the 2 squares may be both in the corner, both in the middle, one in the corner and one in the middle, etc.
Also it is possible that the 2 squares filter the same position. So the number of filtered positions of a pair is smaller than or equal to the sum of the number of filtered positions of the 2 squares.

Hypothesis: The smaller the number of filtered position of a pair, the larger the number solutions.

Let me start with a spoiler: this hypothesis seems to be incorrect.
In the first linked file below we have calculated the correlation factor: -0.31. It should be much closer to -1 in order to make sense.

The idea is that a smaller number of filtered position implies a larger number of unfiltered positions, which have a bigger chance of being combined into a solution. But alas, it is not so easy.

If anyone has a brilliant idea of a way to calculate (or at least estimate) the number of solutions per pair, let them speak up now.

Here are the results so far. First the list of all calculated pairs. Here is the total list, either reverse sorted by number of solutions or by number of filtered positions. In the header you find details on solutions and filtered positions.
For filtered positions you see average, minimum, maximum and standard deviation. 
For solutions you see these values as well plus a total and an estimated end total. The total is the sum of the calculated solutions so far.


The estimated end total is calculated upon the assumption, that the current average is a good predictor of the end total. An example:
We need to calculate N pairs, but so far we did only M<=N. The sum of the solutions of these M pairs is S. Hence the average is S÷M. The estimated end total is N×S÷M

There is also an estimated standard deviation (of the estimated end total). In order to explain how this is calculated we introduce:
- A: average equal to S÷M
- B: standard deviation of A. For the calculations: see text books

Now we rewrite the estimated end total: 
N×S÷M=
[M×S÷M]+[(N-M)×S÷M]=
[S]+[(N-M)×A]

The first term is the total so far. That is a known calculated value with standard deviation 0: it is what it is.
The second term is the real estimate of the sum of the remaining uncalculated solutions. Since A is the standard deviation of B, [(N-M)×A] has a standard deviation of .[(N-M)×B]. Notice this value decreases with increasing M, and becomes 0 if N=M.

There as some more pages with grouping. First there is the grouping per category: date, incorrect date, double month and double number. The objective is to view if there a big difference between these categories. At this moment (July 17, 2025) there are too little double month and double number data to say anything about it.
Here too the list is available reverse sorted by number of solutions or number of filtered positions.

Lastly the pairs are grouped per square. That means every pair appear twice, i.e. in the list of both its squares. Again there is a reverse sorting by number of solutions or by number of filtered positions.

The previous list may one day be summarised per square. Currently I have no idea when or how, but still I reserved 2 pages, as always with reverse sorting by number of solutions and reverse sorted by number of filtered positions.

Reacties