Saturday, April 25, 2026

A* Algorithm, in DataScript

Example of presenting algorithm in meta-language DataScript (.ds3)

A* Search Algorithm — Tutorial

What Is A*?

A* (pronounced "A-star") finds the shortest path between two points in a graph or grid.
Invented in 1968 by Peter Hart, Nils Nilsson, and Bertram Raphael at Stanford Research Institute.

It's the standard pathfinding algorithm in games, robotics, and navigation.

The Family of Search Algorithms

A* doesn't exist in isolation — it's the final step in a natural progression:

1. Breadth-First Search (BFS)

Explores in all directions equally. Like dropping a stone in water — the frontier
expands as a ring. Finds shortest path when all moves cost the same.

frontier: int[] = []        # FIFO — first in, first out
ds_push(frontier, start)
came_from[start] = -1

while len(frontier) > 0
   current := frontier[0]
   # shift front (dequeue)
   for i in 0..<len(frontier) - 1
      frontier[i] = frontier[i + 1]
   ds_pop(frontier)

   for d in 0..<8
      nx := current_x + DX[d]
      ny := current_y + DY[d]
      ni := to_index(nx, ny)
      if came_from[ni] == -2       # not visited
         ds_push(frontier, ni)
         came_from[ni] = current

Guarantees: shortest path (uniform cost)
Problem: explores everything — slow when you know where the goal is

2. Dijkstra's Algorithm (1956)

Like BFS, but handles different movement costs (roads vs mud vs mountains).
Uses a priority queue ordered by actual cost from start.

# open set as Node[] with f = cost_so_far (no heuristic)
open: Node[] = []
ds_push(open, Node { x: sx, y: sy, f: 0 })
cost_so_far[to_index(sx, sy)] = 0

while len(open) > 0
   # find node with lowest f (cost_so_far)
   best := 0
   for bi in 1..<len(open)
      if open[bi].f < open[best].f
         best = bi
   cur := open[best]
   open[best] = open[len(open) - 1]
   ds_pop(open)

   ci := to_index(cur.x, cur.y)
   for d in 0..<8
      nx := cur.x + DX[d]
      ny := cur.y + DY[d]
      ni := to_index(nx, ny)
      new_cost := cost_so_far[ci] + move_cost(ci, ni)
      if new_cost < cost_so_far[ni]
         cost_so_far[ni] = new_cost
         came_from[ni] = ci
         ds_push(open, Node { x: nx, y: ny, f: new_cost })

Guarantees: optimal path with varying costs
Problem: still explores in all directions — doesn't "aim" for the goal

3. Greedy Best-First Search

Uses a heuristic to aim straight for the goal. Fast but can be fooled by obstacles.

# same as Dijkstra but priority = heuristic only (ignores actual cost)
f := heuristic(nx, ny, gx, gy)      # estimated distance to goal
ds_push(open, Node { x: nx, y: ny, f: f })

Fast: explores far fewer nodes than Dijkstra
Problem: NOT optimal — can find longer paths, gets stuck behind walls

4. A* — The Best of Both

The key insight: combine actual cost with estimated remaining cost.

f(n) = g(n) + h(n)

g(n) = actual cost from start to n         (like Dijkstra)
h(n) = estimated cost from n to goal       (like Greedy)
f(n) = total estimated cost through n      (priority for the queue)

open: Node[] = []
ds_push(open, Node { x: sx, y: sy, f: heuristic(sx, sy, gx, gy) })
g_score[to_index(sx, sy)] = 0

while len(open) > 0
   # find node with lowest f = g + h
   best := 0
   for bi in 1..<len(open)
      if open[bi].f < open[best].f
         best = bi
   cur := open[best]
   open[best] = open[len(open) - 1]
   ds_pop(open)

   ci := to_index(cur.x, cur.y)
   if cur.x == gx and cur.y == gy
      break                              # early exit!

   for d in 0..<8
      nx := cur.x + DX[d]
      ny := cur.y + DY[d]
      ni := to_index(nx, ny)
      new_g := g_score[ci] + move_cost(ci, ni)
      if new_g < g_score[ni]
         g_score[ni] = new_g
         came_from[ni] = ci
         f := new_g + heuristic(nx, ny, gx, gy)
         ds_push(open, Node { x: nx, y: ny, f: f })

Guarantees: optimal path (if heuristic is admissible)
Fast: explores far fewer nodes than Dijkstra
Early exit: can stop as soon as the goal is reached

The Heuristic — The Critical Choice

The heuristic h(n) estimates remaining distance. The choice of heuristic depends on
how movement works. This is where most implementations go wrong.

Rules

Admissible: must never overestimate the true cost (otherwise path may not be optimal)
Consistent: h(a) <= cost(a,b) + h(b) for any edge (a,b) — guarantees nodes are
processed at most once
Higher is better (as long as still admissible) — prunes more of the search space

Common Heuristics for Grids

Movement	Heuristic	Formula
4-direction (no diagonals)	Manhattan	`
8-direction (king moves)	Chebyshev	`max(
Any angle (free movement)	Euclidean	`sqrt(dx² + dy²)`

Why Most Rosetta Code Implementations Are Wrong

The Rosetta Code A* task uses 8-directional king movement (all moves cost 1,
including diagonals). The correct heuristic is Chebyshev distance: max(|dx|, |dy|).

Heuristic Used	Admissible?	Problem
Chebyshev `max(\|dx\|, \|dy\|)`	Yes	Correct — exact match for king moves
Manhattan `\|dx\| + \|dy\|`	No	Overestimates diagonals (says 2, truth is 1)
Euclidean `sqrt(dx²+dy²)`	Yes	Works but underestimates — explores too many nodes
Squared Euclidean `dx²+dy²`	No	Grossly overestimates — completely wrong
Zero	Yes	Degrades to Dijkstra — "not really A*"

The C, C++, Java, JavaScript, Dart, and Common Lisp implementations on Rosetta Code
were all flagged for using wrong heuristics.

Rule of thumb: match your heuristic to your movement model.

Path Reconstruction

A* doesn't store the path during search — it stores came_from[node] = previous_node.
After reaching the goal, walk backwards:

path: int[] = []
pi := to_index(gx, gy)
while pi != -1
   ds_push(path, pi)
   pi = came_from[pi]
# reverse
lo := 0
hi := len(path) - 1
while lo < hi
   tmp := path[lo]
   path[lo] = path[hi]
   path[hi] = tmp
   lo = lo + 1
   hi = hi - 1

Complexity

Aspect	Value
Time	O(E log V) with binary heap — E edges, V vertices
Space	O(V) — stores g-scores, came_from, open/closed sets
Optimal	Yes, if h(n) is admissible
Complete	Yes, on finite graphs

In practice, a good heuristic makes A* explore far fewer nodes than Dijkstra.
On a grid with a clear path, A* can be 10-100x faster.

When to Use What

Situation	Algorithm
Uniform cost, explore everything	BFS
Varying costs, no target	Dijkstra
Need speed, don't need optimal	Greedy Best-First
Need optimal path to a specific goal	A*
Very large graphs, need suboptimal fast	Weighted A* (`f = g + w*h`, w > 1)

DataScript Implementation

The DS3 implementation in source/ds3/a_search.ds3 follows the Rosetta Code task:

Grid: 8×8, cells are 0 (open) or 1 (barrier, cost 100)
Movement: 8 directions (king moves), cost 1 per step
Heuristic: Chebyshev distance — max(|dx|, |dy|) — correct for this movement model
Data structures: arrays for g_score, came_from, closed; Node records in open list
Priority queue: simple scan-for-minimum (fine for 64-cell grid; for large grids, use a heap)

Expected output: path of 11 steps avoiding the U-shaped barrier.

DraganSr