Gomoku, sometimes called “Five in a Row,” extends the lineage of my adversarial search projects into a space of higher combinatorial complexity. Like most two-player board games, the rules are disarmingly simple— just place five stones in a row—but the underlying search space is immense and grows explosively with every move. Unlike Tic-Tac-Toe, where the full game tree can be exhaustively searched, or Connect4, where clever optimisation can reduce the number of traversed branches, Gomoku’s game tree expands so rapidly that brute force becomes impossible almost immediately.
To illustrate this difference in complexity, we can examine the first move across the three games. On an empty board, Tic-Tac-Toe's 3x3 grid gives us 9 spaces in which to place our first piece, but which space is most likely to lead to a win? The Minimax algorithm helps us answer that question by assessing future states of the game. Assuming your opponent plays optimally, we can find the best move that maximises our chances of winning while minimising the opponent's. However, to make this evaluation, a naive minimax must traverse through:
- ~255K nodes to explore all sequences of possible moves for TicTacToe
- ~10^21 nodes to explore all sequences of possible moves for Connect4
- ~10^360 nodes to explore all sequences of possible moves for Gomoku. Hundreds of orders of magnitude larger than the number of atoms in the observable universe.
This made Gomoku an ideal testing ground for creativity: it required a solution beyond naive approaches to design algorithms that could balance systematic reasoning with adaptive learning. For me, Gomoku became a laboratory for ideas, a place to experiment and a personal bridge between classical adversarial search techniques and modern inspirations like AlphaGo.
Why Gomoku, and Why Memory?
After Tic-Tac-Toe and Connect 4, I wanted to step into an environment where brute force was no longer viable. Gomoku struck the perfect balance: easy to explain, but nearly impossible to fully compute. The challenge was to design an engine that could not only play reasonably well in the short term but also accumulate lessons from previous games. I was fascinated by the idea that memory could act as a kind of lightweight analogue to deep learning. Inspired by AlphaGo and AlphaZero, I asked myself: what if instead of training massive networks, I created a system where knowledge from past play was baked directly into the search process? Could classical algorithms like Minimax be given a kind of intuition by remembering what had worked before?
Building the Memory-Mapped Engine
-
MemoryMap Data Structure:
- At its core, the MemoryMap is a global record—a table mapping board states and moves, applying weights to represent their historical success rates.
- Every time a move contributes to a win, its weight is nudged upward; when it leads to a loss, it is penalized.
- Over time, good moves naturally rise in prominence while poor moves fade away, preventing the system from repeatedly falling into the same traps.
- This means that with each new game played, the AI becomes more seasoned, carrying forward the echoes of its past experience.
-
Search Algorithm with Memory:
- The foundation is still Minimax with Alpha-Beta pruning, the classic backbone of adversarial search.
- But rather than treating every branch equally, the MemoryMap biases exploration toward lines of play that history suggests are more promising.
- In situations where multiple moves are heuristically tied, the MemoryMap acts as the tie-breaker, injecting experience into the decision.
-
Evaluation Functions:
- I used straightforward heuristics based on patterns: pairs, triplets, open fours, and the infamous double-threes.
- The scoring function assigns different weights to these features, encouraging the AI to extend strong lines or block dangerous threats.
- Defensive and offensive moves are balanced—blocking an opponent’s winning move is valued nearly as much as advancing one’s own position.
-
Self-Correcting Dynamics:
- The MemoryMap doesn’t just reinforce good habits; it actively discourages bad ones.
- If the system overvalues a poor move in the short term, repeated exposure through self-play will eventually push that weight back down.
- This self-correcting loop ensures that even without sophisticated neural networks, the agent learns from mistakes in a way that feels organic.
-
AlphaGo Inspiration:
- AlphaGo relied on neural networks to provide priors to its Monte Carlo Tree Search.
- My MemoryMap serves a similar role, but in a more classical, lightweight fashion. Instead of a neural prior, I use accumulated experience as the guiding signal.
- While less powerful, this parallel showed me how the principles of modern AI can be reframed through simpler, more transparent mechanisms.
What Emerged in Practice
(Observations to be added after further experimentation.)
Even at an early stage, the integration of MemoryMap noticeably improved efficiency. The AI spent less time exploring dead-end branches and more time reinforcing lines that actually mattered. This gave the impression of an opponent that was learning over time—avoiding obvious pitfalls it had previously fallen into and gravitating toward strategies that had proven reliable. While it didn’t yet rival the sophistication of a neural-enhanced engine, the system proved that even classical search can feel adaptive when infused with memory.
Where It Could Go Next
- Massive Self-Play: Automating millions of games between agents to fill the MemoryMap with rich experience, steadily sharpening the AI.
- Neural Hybridization: Adding a lightweight policy or value network to work alongside the MemoryMap, blending classical heuristics, memory, and learning.
- Cross-Domain Testing: Extending the same idea into other games such as chess, shogi, or even non-grid games, to see if historical weighting generalizes well.
- Dynamic Heuristics: Letting the AI adjust heuristic weights automatically, based on which patterns succeed most often in practice.
Challenges and Curiosities
One of the trickiest aspects was dealing with positions that contained multiple simultaneous threats. For example, facing two open threes at once often caused indecision: should the AI block, extend, or attempt something unexpected? Pure heuristics struggled to rank these situations correctly. The MemoryMap helped somewhat, but not always fast enough to prevent disaster. These situations highlighted the limits of memory alone and reinforced the need for layered strategies: heuristics for tactical urgency, memory for long-term learning, and perhaps neural components for nuanced judgment.