A thought about BFS queue size

There was a Topcoder SRM this morning and one of the problems was essentially to compute the shortest distance between two points on a 2-dimensional grid.  I usually never stick around for the challenge round because I usually dwell on two or three programs, not really understanding them, before time runs out. But I’m going to try reading code from now on because, well, it’s a weakness of mine, and also you get to see different approaches to problems you’d thought were pretty hard.

Back to the problem, another guy in my room took a different approach from mine and implemented breadth-first search. Basically, you expand outwards uniformly from the source node searching for a destination node. The advantage is that once you find the destination node, you have the shortest path and are done. Sadly, a mistake in the implementation ruined this good approach.

It took me a while to figure out that the guy had implemented his own queue for BFS, using a fixed-size array and pointers to the front and back.  The terminating condition in this case would be to check whether the front and back pointers were the same, ie. the queue is empty. It wasn’t until I could run the program in a debugger that I noticed the program was segfaulting due to the front pointer extending way beyond the upper bound of the array. However! I didn’t recognize this for what it was and decided to patch things up by turning the queue into a circular queue.

This fixed the problem for the grid-type graphs that the problem was giving out, but it left me wondering about a fixed queue size that could handle all cases. In the worst case, BFS will visit every node in a graph once, and in theory, it’s possible to queue up all of the nodes at the very first step. The program that I “fixed” with a circular queue was using a fixed size queue with a size that was an order of magnitude lower than the maximum number of nodes that could be dished out by the problem. I was definitely lucky that the number of nodes enqueued at any one time never exceeded the magic number being used to size the queue.

So it’s interesting to think about what that running number – the number of nodes enqueued at any given time during BFS – represents and whether we are able to give an upper bound for it. At any given moment, it gives the size of a “cross-section” of the graph. I guess without knowing about the structure of the graph, you can’t really assume much and so you have to bound the queue by the total nodes in the graph. The graphs in this problem were 4-regular, ie. every node has 4 adjacent nodes, and since the program was passing, this upper bound is definitely an overestimate.

I guess the moral of the story is to use the data structures provided by your local library, because thinking about upper bounds can be a deal breaker, and implementing your own data structure on the fly is error-prone and definitely cost this guy a hundred and some points.