Chess Beyond the Rules Heikki Hyötyniemi Control Engineering Laboratory P.O. Box 5400 FIN-02015 Helsinki Univ. of Tech. Pertti Saariluoma Cognitive Science P.O. Box 13 FIN-00014 Helsinki University 1. Introduction Chess memory has been vital problem in the field of expertise research. It has been the forefront of general skills research for decades and a large number of important insights has been made firstly by chess researchers. Djakov et al. (1926) as well as de Groot (1965, 1966) showed that there are essential differences between players of various skill levels in their ability to recall chess positions unfamiliar to them. A Dutch research group illustrated that skill differences disappear with random positions and short presentations times (see Vicente and de Groot 1990), which data allowed the first synthesis (Chase and Simon 1973). This synthesis laid on several important theoretical insights. Firstly, it assumed that experts store the showed chess positions into their working memory. Secondly, it was assumed that they have a large storage of chess specific patterns in their long-term memory, which they can use like ordinary people use words for storing texts. They build chunks of several pieces, and this allows them to store positions in sets of few patterns instead of few individual pieces. Since chunks occupy working memory as much as individual pieces, the performance of pattern-based recall is much better. Four chunks with four to
five pieces give the outcome of 20-25 recalled pieces, while novices recalling individual pieces cannot reach more that 4-5 piece in all (Chase and Simon 1973). The basic empirical findings have been confirmed numerous times since and new information about chess masters memory has been received on regular basis. The initial view is gradually changing. Charness (1976) illustrated that experts do not store presented chess positions into their working memory, Goldin (1978) that recognition as a process is not similar as recall in chess, and Saariluoma (1985) that experts are better at detecting information both in game and random positions. During this decade both new insights as well as accurate reanalysis have been published. The most relevant series of experiments concern location coding (Gobet and Simon 1996, Saariluoma 1984, 1994). This problem refers to the function of location information in the chess specific memory representation of chess players. When a chunk is a set of a few regularly organized pieces, it must entail information about the kind and color of the pieces. It is much more problematic, whether the absolute location of the pieces in terms of board coordinates is also a part of the chunks representation, so that a pawn chain a2, b3, c4, for example, is differently represented than a pawn chain a3, b4, c5. Experiments, which have manipulated the location of the chunks, systematically speak for the absolute location coding. The main problem we raise here concerns argumentation. It has been very commonplace to argue since Chase and Simon (1973) and Simon and Gilmartin (1973) that chess players have in their memory a vast storage of chess specific chunks. What is the structure of this knowledge is very open question. In computer simulation models, which have been used to argue for the large
storage of chunks, the representation is a list (Gobet and Simon 1996, Simon and Gilmartin 1973). However, the development of computational modeling during the last two decades has provided us with very powerful new means to investigate memory representations. Various versions of neural computing systems give as new possibilities to analyze the structure of information in human memory. In this paper we set the problem of pattern structure from neural point of view. We want to ask, whether connectionist approaches could provide us with some alternative chunks descriptions than list-like representations typical to current models. 2. About the data model The approach to chunk modeling here is connectionistic rather than symbolic. This data-oriented starting point means that the discussion becomes somewhat mathematical. In the experiments, the situations on a chess board were presented as 768 dimensional binary vectors (one bit for each piece in each location, together 12 times 64 bits; the bit is 1 if the corresponding piece happens to be there, and 0 otherwise). Only the actual piece configurations are coded; no rules of chess, not even information about the succession of configurations are utilized in any way. The 768 dimensional observation samples define the configuration space. The observed configurations, typical positions during chess games, are now seen as measurement data. In Fig. 1, some typical expert-level games in the configuration space are shown: the 768 dimensional configuration space has been projected onto two dimensions, spanned by the two most significant principal components of the chess data (see Basilevsky, 1994). In Fig. 2, the data samples that were used in the experiments are all shown projected in a three dimensional subspace.
Figure 1. Some complete chess games shown as trajectories in the configuration space. The original piece configuration is shown on the right; as more and more pieces are captured, the trajectory gets nearer to the origin of the configuration space Figure 2. The 2500 training configurations shown as a three dimensional image
The statistical model (see Fig. 2) does not make the relevant data properties visible. When the high dimensional data is mechanically projected onto lower dimension, the structure underlying the piece configurations can no more be recognized. What would be a better framework for the chess data? The ontological assumption now is that the data is concentrated in clusters that are varied by special features. These clusters and features are now together identified as chunks. This statistical view of chunks can be illustrated as shown in Fig. 3: the data points in the configuration space are concentrated around prototypical clusters. Different opening strategies, for example, should determine cluster centers of their own; special features, or variations around the center are caused by pieces that are not in their nominal locations. Figure 3. A schematic view of configuration categories as cluster centers in the configuration space. The opening library examples are rather trivial chunk prototypes; in this case the category extends over the whole board, and dislocated pieces push the points in the configuration space away from the category center
Figure 4. A rather typical local chunk standing for castling. Note that in practice the contents of the chunk vectors are numeric and continuous; only after thresholding, visually understandable images can be obtained The same chunk structure can be used to represent the cluster (or category) centers and the fine-tuning features alike. One could speak of global and local chunks, respectively. An example of a local chunk is shown in Fig. 4. The numeric nature of the chunks enhances their expressive power. Whereas the traditional, symbolic chunks are mutually exclusive, now linear weighted sums of the chunks can be used to reconstruct the piece configurations: f = a1c1+ L + a n c n. Above, f represents the observed board as represented using the configuration space coordinates, c i stands for the chunk number i, and a i is its weight. This formulation explicitly shows that the dimension of the chunk vectors is the same as the dimension of the configuration space is; in principle, the chunks extend over the whole board (most of the vector entries being negligible, though).
3. Chunking experiments The data model was trained using 2500 configurations from expert level games; 256 chunks were extracted. The actual training process is not elaborated on here, only the results of using the final model are presented (for details, see Hyötyniemi, 1997). In Fig. 5, the reconstruction errors during one game are plotted. It seems that when the configurations are typical, as in the beginning of the game, the skilled chunks can accurately explain the configuration. Figure 5. Reconstruction errors during one game (compare to Figs. 6 and 8). Only seven most appropriate of the all available chunk prototypes are utilized ( sparse coding ; compare to the human short-term memory capacity). Towards the end, when there are no more many pieces on the board, the unskilled trivial chunks (explained later) can efficiently represent individual pieces (zero error being obtained, of course, if only 7 pieces were remaining). However, note that the skilled model cannot be beaten easily, even if there is much more memory capacity allocated for this task in the extended model
Figure 6. A typical chess configuration (cf. Fig. 5) Figure 7. The reconstructed board standing for the typical situation in Fig. 6. Different numbers of chunks are used; first, only one chunk (the category prototype) is utilized (a), then three chunks (b), five chunks (c), and finally seven chunks (d)
Figure 8. An untypical chess configuration (cf. Fig. 5) Figure 9. Reconstruction of non-typical configuration shown in Fig. 8. On the left, 7 out of 256 skilled chunks are used; on the right, 7 out of a set of 256 skilled and 768 unskilled chunks are utilized. The unskilled chunks are in this case extremely localized; each of them represents one piece/location combination (so that 768 of them exhaust all one-piece alternatives). It is not a surprise that the non-optimized chunks enhance the robustness of the reconstruction in non-typical piece configurations
Figure 10. One of the 256 skilled and very specialized chunks: on the left, the chunk vector c horse has been selected alone, so that its weight is a horse = 1, all other chunk weights being zeros; on the right, the weighting of the same chunk is now a horse = 1 (note thresholding). It seems that this chunk has been allocated for distinguishing between the two knight orientations applying this chunk toggles between the two alternatives. In a way, it can be argued that the system has learned a complete chess move, even if no rules of chess or no succession of piece configurations has been shown to the algorithm explicitly 4. Discussion In traditional connectionist approaches it is usually assumed that the input output mapping is all that matters (for example, see Bishop, 1995); now, on the other hand, it is assumed that structure is essential. This internal structure is reflected in the chunks; it seems that sometimes this structure that is based on simple statistical correlations may reveal something that feels appealing (see Fig. 10). The role of representation rather than computation is emphasized as a tool to gain intuition. The view of chunks is now very different as compared to the original chunks are in this case genuinely additive real-valued vectors spanning subspaces in the metric configuration space. The automatically emerging chunks often seem to
be natural and easily interpreted. Some of the chunks (those standing for basic configuration categories) extend over the whole board; it is not a surprise that specially the pawns that do not move very much constitute the skeleton in these chunks. On the other hand, there also emerge very specialized chunks that can be utilized to modify the basic category prototype. One striking difference compared to the standard chunk paradigm is that whereas chunks usually are thought to expand when trained, now the opposite takes place: originally the chunk prototypes, starting from random values, extend over the whole board, whereas training makes them shrink as the representation becomes sparse and the chunks become more specialized. The compact view of mental images as sums of chunks helps to attack some of the difficult questions in chess expertise: The model explains the expert vs. novice performance in typical vs. untypical situations. Recognition process (or, actually, matching in this case) does not become more complex as new knowledge is acquired; rather, the operation becomes more automated when using the skilled chunks. Errors that are made in piece recall are rather plausible. There are some experiments presented in the literature that can readily be simulated also in this framework; the results using the numerical chunk model reflect nicely the behavior of human subjects. For example, see (Saariluoma, 1994): in Fig. 11, the interchange of quadrants is simulated. Similarly, the novice learning of chess configurations can be simulated in this framework (see Fig. 12; refer to Laine et al., 1998). It seems that the connectionist approach has real potential.
Figure 11. Quadrants reversed (compare to Fig. 6): the distorted configuration on the left, and its reconstruction on the right. The simulation reflects the experimental results: the quadrants that are intact are recalled rather well, whereas the interchanged ones cannot be recalled (note that according to the chunk model there must be a king in g1! ) Figure 12. Novice learning of chess configurations. The learning pattern in simulations resembles that of human subjects
Bibliography Basilevsky, A.: Statistical Factor Analysis and Related Methods. John Wiley & Sons, New York, 1994. Bishop, C.M.: Neural Networks for Pattern Recognition. Clarendon Press, Oxford, 1995. Charness, N.: Memory for chess positions: Resistance to interference. Journal of Experimental Psychology: Human Learning and Memory, 2, 1976, pp. 641 653. Chase, W. G. and Simon H. A.: The mind s eye in chess. In W. Chase (ed.), Visual information processing. Academic Press, New York, 1973. de Groot, A. D.: Thought and Choice in Chess. Mouton, The Hague, 1965. de Groot, A. D.: Perception and memory versus thought: Some old ideas and recent findings. In B. Kleinmuntz (ed.), Problem Solving. New York, 1966. Djakov, I. N., Petrovsky, N. B., and Rudik, P. A.: Psihologia Shahmatnoi Igry (Chess psychology). Avtorov, Moscow, 1926. Gobet, F. and Simon, H.A.: Templates in Chess Memory: A Mechanism for Recalling Several Boards. Cognitive Psychology, 31, 1996, pp. 1 40. Goldin, S. E.: The effects of orienting tasks on recognition of chess positions. American Journal of Psychology, 91, 1978, pp. 659 671. Hyötyniemi, H.: On the statistical nature of complex data. In SCAI'97 Sixth Scandinavian Conference on Artificial Intelligence: Research Announcements (ed. Grahne, G.), Helsinki University, Department of Computer Science, Report C-1997-49, Helsinki, Finland, 1997, pp. 13 27. Available from http://saato014.hut.fi/hyotyniemi/publications/. Laine, T., Hyötyniemi, H., and Saariluoma, P.: The foundations of simulative theorizing. Proceedings of ECAI 98 (ed. Prade, H.), John Wiley & Sons, New York, 1998, pp. 109 113. Saariluoma, P.: Coding problem spaces in chess. Commentationes scientiarum socialium, 23, 1984, Turku: Societas Scientiarum Fennica. Saariluoma, P.: Chess players' intake of task relevant cues. Memory and Cognition, 13, 1985, pp. 385 391. Saariluoma, P.: Location Coding in Chess. The Quarterly Journal of Experimental Psychology, 47A (3), 1994, pp. 607 630. Simon, H. A. and Gilmartin, K.: A simulation of memory for chess positions. Cognitive Psychology, 5, 1973, pp. 29 46. Vicente, K. J. and de Groot, A. D.: The memory recall paradigm: Straightening out the historical record. American Psychologist, 45, 1990, pp. 285 287.