Statistics

In order to understand this page, be sure you read the definitions page. Especially important is that you understand the concept of the number of filtered positions per pair.

When looking at the number of solutions per pair there are big differences: 
- January 25 has 216 solutions 
- October 6 has 7 solutions

Looking beyond the 366 dates, we find a total of 903 pairs, with even bigger differences:
- January/July has 305 solutions
- 7 pairs have 0 solution. They are listed on a separate page.
An extra remark about the last finding: there are no dates among these pairs. Frankly I am impressed that DragonFjord managed to avoid dates without solutions.

Some statistical highlights:

Group # Avg SD Min Max
All 903 66 44 0 305
Dates 366 67 38 7 216
Incorrect dates 6 109 51 34 183
Double months 66 88 63 0 305
Double numbers   465 63 44 0 286

We will come to the details later on this page. Some quick conclusions:

  • The overall average (66) is almost the same as the average of group "Dates" (67). Actually, the difference is smaller than shown here because 66 is rounded down and 67 is rounded up.
  • The standard deviation (SD) of group "Dates" is 38, the lowest of all groups and lower than the overall SD. The range 7 to 216 is wide, but less wide than the ranges of groups "double months" and "double numbers".
    The small group "Incorrect dates" has a much smaller range, but nevertheless a larger SD. This has to do with small group size.

Looking at these enormous differences in number of solutions, one starts wondering if it is able to predict number of solutions of a pair.

Hypothesis: The smaller the number of filtered positions of a pair (and hence the larger the number of unfiltered, usable positions), the larger the number solutions.

Let me start with a spoiler: this hypothesis seems to give an indication, but hardly a prediction.
In the first linked file below we have calculated the correlation factor: -0.31. It should be much closer to -1 in order to make sense.
We had a vague (but not very high) hope that it would be much better when looking at squares rather than pairs. The correlation factor turns out to be slightly better, but not really impressive: -0.42

But a picture tells more than a thousand words. The first graph shows all 903 pairs:

  • the X axis shows the number of solutions, ranging from 0 to 305 with an average of 66.
  • the Y axis shows the number of filtered positions, ranging from 55 to 387 with an average of 215.
Notice the decreasing trend line: so on average a increase of the number of solutions correlates with a decrease of the number of filtered positions. But the point cloud makes clear the correlation is small. Some examples form the graph:
  • In the lower right hand corner we see Jan/Jul with a maximum of 305 solutions. Its 72 filtered positions is not a minimum, but it is in bottom range of 17 pairs with 55 to 72 filtered positions.
    So that looks pretty good, as is also clear from the fact, the point is almost on the trend line.
  • On the left, on the Y axis, there are 7 points with 0 solutions. As we show on a separate page (and as you can see in the graph) all 7 points are well below the trend line and all have a surprisingly (and disappointingly) low number of filtered positions: the largest number comes with pair 6/14: 170 filtered positions.
  • Pair 3/12 has the largest number of filtered positions: 387. It has only 37 solutions, which is almost half of the average 66. That is fairly OK.
  • Pair 29/30 filters only 55 positions. The number of solutions is 200, again fairly OK.

The below graph shows the data for individual 43 squares:
  • The X axis shows the number of solutions, ranging from 1058 to 5421 with an average of 2781.
    Notice the average is 42 times larger than the pair average. That makes sense, because the total solutions per square are made up adding up the number of solutions of 42 pairs: the selected square combined with all 42 other squares.
  • The Y axis shows the number of filtered positions, ranging from 27 to 221 with an average of 113.
    Notice this average is roughly half of 215, the average number of filtered positions per pair. 113 is slightly more than 215÷2=107.5, because there are overlaps, positions filtered by both squares in a pair.
Again we see that there is a real trend, but also a large point cloud.

If anyone has a brilliant idea for a way to calculate (or at least estimate) the number of solutions per pair, let them speak up now.

4×2 statistics pages

We will explore the number of filtered positions vs. the number of solutions in 4 ways.
Each way comes with 2 pages: "Reverse sorted by number of solutions" and "Reverse sorted by number of filtered positions".
We will now discuss the content of these 4×2 pages, which all have the same setup. This setup is explained in a chapter further on this page: How to read the statistics pages

Statistics of all pairs, also split per form

Click here for the page reverse sorted by number of solutions
Click here for the page reverse sorted by number of filtered positions

These pages have 9 tables, all showing statistics of the 903 pairs. Table 1 compares the number of solutions with the total number of filtered positions; the other tables compare the number of solutions with the number of filtered positions per form. NOTE: the table "Reverse sorted by number of filtered positions" has been used extensively on the Definitions pages.

Statistics of pairs grouped per category

Click here for the page reverse sorted by number of solutions
Click here for the page reverse sorted by number of filtered positions

These pages contain 4 tables referring to 4 groups: 366 dates, 6 incorrect dates, 66 double months and 465 double numbers. Conclusion: the category "dates" (which is what this puzzle is all about) does not have the highest average number of solutions per pair; it is somewhere is the middle. On the other hand: each date has a solution, while 2 double months and 5 double numbers do not.

Statistics of pairs grouped per square

Click here for the page reverse sorted by number of solutions
Click here for the page reverse sorted by number of filtered positions

The pairs are grouped per square, resulting in 43 tables. That means every pair appear twice, i.e. in the list of both its squares.

Square summary, also split per form

Click here for the page reverse sorted by number of solutions
Click here for the page reverse sorted by number of filtered positions

This is the previous list summarised per square. This results in 1 table with all 43 squares. On top of that 8 tables are added to compare the number of solutions per square with the number of filtered positions per form.

How to read the statistics pages

All 4×2 pages are set up the same way: at the top you find a table of contents with internal links to sections on the page.

All page sections contain 3 components:

  • Section header and links
  • Table with totals, averages and other statistic information
  • Table with details

Section header and links

The section title contains the section name, followed by the number possibilities. We are using "possibilities" here rather than "solutions", because we are already that term for something else. The number possibilities is equal to the length of the table with details.

In case the section handles one form, images of all (2, 4 or 8) orientations of that form are shown.

Next are 2 lines with links. 
The first line contains 1 link: if you are on the page "reverse sorted by number of solutions" you directed to the corresponding section on the page "reverse sorted by number of filtered positions", and vice versa.
The second line contains 2 or 3 internal links:
- link to the top of the page
- if you are not in the first section: link to previous section
- link to next section OR (for last section) link to bottom of page

Table with totals, averages and other statistic information

This table has 2 columns, What and Value. There are 3 parts, with a fixed number of rows:
  • Part Solutions contains 5 rows
    • Total
    • Average
    • Minimum
    • Maximum
    • Standard deviation
  • Part Filtered positions (Total number of positions: 961) [961 is only applicable if all forms are taken into account!]
    The row "Total" is missing, because it makes no sense.
    The Total number of positions in the header is added as a reference for the found average, minimum and maximum.
  • Part Solutions vs filtered positions
    There is only 1 row: Correlation factor, which is typically around -1/3.
    In earlier versions there a chi-squared calculation was added. But this hardly makes sense, as one can see from (1) the above graphs with trend lines and (2) the correlation, that is too far away from -1.
NOTE: The order of these parts may different. The above listing is valid for pages "reverse sorted by number of solutions". For pages "reverse sorted by number of filtered positions" the first 2 parts are swapped.

Table with details

This table has 4 columns:

  1. Ranking: In case of equal rankings, the first one has a number, while all the next ones are blank.
  2. The column to be sorted on:
    1. # solutions on pages "reverse sorted by number of solutions"
    2. # Filtered positions on pages "reverse sorted by number of filtered positions".
  3. Value & Link: in case the value is a pair, the link to the details page for that pair.
    In the 2 pages "Square summary, also split per form", value is a square and the link leads to the relevant section on page "Statistics per square reverse sorted by number of solutions".
  4. See column 2, but then the other column

Reacties