Saturday, April 25, 2026

A* Algorithm, in DataScript

Example of presenting algorithm in meta-language DataScript (.ds3)

Introduction to the A* Algorithm


A* search algorithm - Wikipedia

A* search algorithm - Rosetta Code


A* Search Algorithm — Tutorial

What Is A*?

A* (pronounced "A-star") finds the shortest path between two points in a graph or grid.
Invented in 1968 by Peter Hart, Nils Nilsson, and Bertram Raphael at Stanford Research Institute.

It's the standard pathfinding algorithm in games, robotics, and navigation.

The Family of Search Algorithms

A* doesn't exist in isolation — it's the final step in a natural progression:

1. Breadth-First Search (BFS)

Explores in all directions equally. Like dropping a stone in water — the frontier
expands as a ring. Finds shortest path when all moves cost the same.

frontier: int[] = []        # FIFO — first in, first out
ds_push(frontier, start)
came_from[start] = -1

while len(frontier) > 0
   current := frontier[0]
   # shift front (dequeue)
   for i in 0..<len(frontier) - 1
      frontier[i] = frontier[i + 1]
   ds_pop(frontier)

   for d in 0..<8
      nx := current_x + DX[d]
      ny := current_y + DY[d]
      ni := to_index(nx, ny)
      if came_from[ni] == -2       # not visited
         ds_push(frontier, ni)
         came_from[ni] = current
  • Guarantees: shortest path (uniform cost)
  • Problem: explores everything — slow when you know where the goal is

2. Dijkstra's Algorithm (1956)

Like BFS, but handles different movement costs (roads vs mud vs mountains).
Uses a priority queue ordered by actual cost from start.

# open set as Node[] with f = cost_so_far (no heuristic)
open: Node[] = []
ds_push(open, Node { x: sx, y: sy, f: 0 })
cost_so_far[to_index(sx, sy)] = 0

while len(open) > 0
   # find node with lowest f (cost_so_far)
   best := 0
   for bi in 1..<len(open)
      if open[bi].f < open[best].f
         best = bi
   cur := open[best]
   open[best] = open[len(open) - 1]
   ds_pop(open)

   ci := to_index(cur.x, cur.y)
   for d in 0..<8
      nx := cur.x + DX[d]
      ny := cur.y + DY[d]
      ni := to_index(nx, ny)
      new_cost := cost_so_far[ci] + move_cost(ci, ni)
      if new_cost < cost_so_far[ni]
         cost_so_far[ni] = new_cost
         came_from[ni] = ci
         ds_push(open, Node { x: nx, y: ny, f: new_cost })
  • Guarantees: optimal path with varying costs
  • Problem: still explores in all directions — doesn't "aim" for the goal

Uses a heuristic to aim straight for the goal. Fast but can be fooled by obstacles.

# same as Dijkstra but priority = heuristic only (ignores actual cost)
f := heuristic(nx, ny, gx, gy)      # estimated distance to goal
ds_push(open, Node { x: nx, y: ny, f: f })
  • Fast: explores far fewer nodes than Dijkstra
  • Problem: NOT optimal — can find longer paths, gets stuck behind walls

4. A* — The Best of Both

The key insight: combine actual cost with estimated remaining cost.

f(n) = g(n) + h(n)

g(n) = actual cost from start to n         (like Dijkstra)
h(n) = estimated cost from n to goal       (like Greedy)
f(n) = total estimated cost through n      (priority for the queue)
open: Node[] = []
ds_push(open, Node { x: sx, y: sy, f: heuristic(sx, sy, gx, gy) })
g_score[to_index(sx, sy)] = 0

while len(open) > 0
   # find node with lowest f = g + h
   best := 0
   for bi in 1..<len(open)
      if open[bi].f < open[best].f
         best = bi
   cur := open[best]
   open[best] = open[len(open) - 1]
   ds_pop(open)

   ci := to_index(cur.x, cur.y)
   if cur.x == gx and cur.y == gy
      break                              # early exit!

   for d in 0..<8
      nx := cur.x + DX[d]
      ny := cur.y + DY[d]
      ni := to_index(nx, ny)
      new_g := g_score[ci] + move_cost(ci, ni)
      if new_g < g_score[ni]
         g_score[ni] = new_g
         came_from[ni] = ci
         f := new_g + heuristic(nx, ny, gx, gy)
         ds_push(open, Node { x: nx, y: ny, f: f })
  • Guarantees: optimal path (if heuristic is admissible)
  • Fast: explores far fewer nodes than Dijkstra
  • Early exit: can stop as soon as the goal is reached

The Heuristic — The Critical Choice

The heuristic h(n) estimates remaining distance. The choice of heuristic depends on
how movement works.
 This is where most implementations go wrong.

Rules

  1. Admissible: must never overestimate the true cost (otherwise path may not be optimal)
  2. Consistenth(a) <= cost(a,b) + h(b) for any edge (a,b) — guarantees nodes are
    processed at most once
  3. Higher is better (as long as still admissible) — prunes more of the search space

Common Heuristics for Grids

MovementHeuristicFormula
4-direction (no diagonals)Manhattan`
8-direction (king moves)Chebyshev`max(
Any angle (free movement)Euclideansqrt(dx² + dy²)

Why Most Rosetta Code Implementations Are Wrong

The Rosetta Code A* task uses 8-directional king movement (all moves cost 1,
including diagonals). The correct heuristic is Chebyshev distancemax(|dx|, |dy|).

Heuristic UsedAdmissible?Problem
Chebyshev max(|dx|, |dy|)YesCorrect — exact match for king moves
Manhattan |dx| + |dy|NoOverestimates diagonals (says 2, truth is 1)
Euclidean sqrt(dx²+dy²)YesWorks but underestimates — explores too many nodes
Squared Euclidean dx²+dy²NoGrossly overestimates — completely wrong
ZeroYesDegrades to Dijkstra — "not really A*"

The C, C++, Java, JavaScript, Dart, and Common Lisp implementations on Rosetta Code
were all flagged for using wrong heuristics.

Rule of thumb: match your heuristic to your movement model.

Path Reconstruction

A* doesn't store the path during search — it stores came_from[node] = previous_node.
After reaching the goal, walk backwards:

path: int[] = []
pi := to_index(gx, gy)
while pi != -1
   ds_push(path, pi)
   pi = came_from[pi]
# reverse
lo := 0
hi := len(path) - 1
while lo < hi
   tmp := path[lo]
   path[lo] = path[hi]
   path[hi] = tmp
   lo = lo + 1
   hi = hi - 1

Complexity

AspectValue
TimeO(E log V) with binary heap — E edges, V vertices
SpaceO(V) — stores g-scores, came_from, open/closed sets
OptimalYes, if h(n) is admissible
CompleteYes, on finite graphs

In practice, a good heuristic makes A* explore far fewer nodes than Dijkstra.
On a grid with a clear path, A* can be 10-100x faster.

When to Use What

SituationAlgorithm
Uniform cost, explore everythingBFS
Varying costs, no targetDijkstra
Need speed, don't need optimalGreedy Best-First
Need optimal path to a specific goalA*
Very large graphs, need suboptimal fastWeighted A* (f = g + w*h, w > 1)

DataScript Implementation

The DS3 implementation in source/ds3/a_search.ds3 follows the Rosetta Code task:

  • Grid: 8×8, cells are 0 (open) or 1 (barrier, cost 100)
  • Movement: 8 directions (king moves), cost 1 per step
  • Heuristic: Chebyshev distance — max(|dx|, |dy|) — correct for this movement model
  • Data structures: arrays for g_score, came_from, closed; Node records in open list
  • Priority queue: simple scan-for-minimum (fine for 64-cell grid; for large grids, use a heap)

Expected output: path of 11 steps avoiding the U-shaped barrier.

No comments: