
Introduction
The iterative solution to the connected component labelling algorithm
is well described in the literature, but requires quite complex methods
when implemented. The simpler recursive solution has the problem of
using more stack than usually available, even for small images.
This article presents the recursive 4-connected component labelling
algorithm with a workaround for the stack limitation. All in less than
70 lines of C/C++ code. The implementation is written in pure C, so it
can be used in any C/C++ project no matter what the platform and
compiler are.
Background
The connected component labelling is often used in the fields of computer vision and image analysis.
Using the Code
The code consists of a single source file. To use it, simply include the code into your C/C++ project and call the function LabelImage.
void LabelImage(unsigned short width, unsigned short height, unsigned char* input, int* output);
Input image is an array of bytes with 0 values being background and
other values (typically 1 or 255) indicating an object. It is often
found by thresholding.
Output image (the labelled image) is an array of integers with 0
values being background and label numbers starting with 1 up to the
number of labels found.
Output image must be allocated in advance and initialized to zero.
Limitations
Sizes of input and output images are limited to a maximum of width x
height = (65535 x 65535 pixels) due to unsigned short typically being 16
bit.
Sizes of input and output images are also limited by the available
heap size (about "6*width*height" bytes heap memory is allocated
temporarily, which is in the order of 1.5 times the size of the output
image).
Example of how to use is shown below:
unsigned short W = 640;
unsigned short H = 480;
unsigned char* input = (unsigned char*) malloc(W*H*sizeof(unsigned char));
int* output = (int*) malloc(W*H*sizeof(int));
memset(output, 0, W*H*sizeof(int));
LabelImage(W, H, input, output);
The data of the input and output images is layed out directly as continuous IPL images or matrices in OpenCV
(which I think is a standard way of distributing rows and columns of
data in images). This makes it very easy to use OpenCV together with the
algorithm:
Example of how to use in an OpenCV project is as shown below:
unsigned short W = 640;
unsigned short H = 480;
Mat input (H, W, CV_8UC1);
Mat output(H, W, CV_32SC1, Scalar(0));
LabelImage(W, H, (unsigned char*)input.data, (int*)output.data);
To initialize the input image in the above examples, the distribution of rows and columns must be known as stated previously.
It is exemplified by the following simple demonstration, a dummy image with (width, height) = (8,6) pixels:

- Rows are numbered 0 .. 5
- Columns are numbered 0 .. 7
- Shown pixel value of 255 shown at index = column + row*width = 6 + 2*8 = 22, i.e. input[22] = 255.
Implementation
As the method is a modification of the standard recursive algorithm, let's first show how this look like.
The standard 4-connected component recursive algorithm written in C/C++ (recursive procedure LabelComponent comes in the next code snippet):
void LabelImage(unsigned short width, unsigned short height, unsigned char* input, int* output)
{
int labelNo = 0;
int index = -1;
for (unsigned short y = 0; y < height; y++)
{
for (unsigned short x = 0; x < width; x++)
{
index++;
if (input [index] == 0) continue;
if (output[index] != 0) continue;
labelNo++;
LabelComponent(width, height, input, output, labelNo, x, y);
}
}
}
The recursive part comes here in the procedure LabelComponent:
void LabelComponent(unsigned short width, unsigned short height,
unsigned char* input, int* output, int labelNo, unsigned short x, unsigned short y)
{
int index = x + width*y;
if (input [index] == 0) return;
if (output[index] != 0) return;
output[index] = labelNo;
if (x > 0) LabelComponent(width, height, input, output, labelNo, x-1, y);
if (x < width-1) LabelComponent(width, height, input, output, labelNo, x+1, y);
if (y > 0) LabelComponent(width, height, input, output, labelNo, x, y-1);
if (y < height-1) LabelComponent(width, height, input, output, labelNo, x, y+1);
}
Now we only need to rewrite procedure LabelComponent to avoid stack overflow. But to do this, first we must understand what happens when a procedure is called:
- Calling a procedure consists of putting the procedure parameters
onto a stack together with the program address just after the procedure.
Then jump to the procedure address (see macro
CALL_LabelComponent below). All parameter variables in the procedure then refer to their actual stack position. - When the procedure ends (the return points in the procedure - see macro
RETURN below), pop the values of the stack and jump to the address just after the procedure (which also was put onto the stack).

The whole program has been re-written with own implemented stack (the four macro names CALL_LabelComponent, RETURN, X, and Y should increase readability):
CALL_LabelComponent corresponds to a recursive call of the LabelComponent procedure RETURN indicates the end of a call to the LabelComponent procedure X and Y (upper case) resemble the local variables x and y (lower case)
#define CALL_LabelComponent(x,y,returnLabel) { STACK[SP] = x;
STACK[SP+1] = y; STACK[SP+2] = returnLabel; SP += 3; goto START; }
#define RETURN { SP -= 3; \
switch (STACK[SP+2]) \
{ \
case 1 : goto RETURN1; \
case 2 : goto RETURN2; \
case 3 : goto RETURN3; \
case 4 : goto RETURN4; \
default: return; \
} \
}
#define X (STACK[SP-3])
#define Y (STACK[SP-2])
void LabelComponent(unsigned short* STACK, unsigned short width, unsigned short height,
unsigned char* input, int* output, int labelNo, unsigned short x, unsigned short y)
{
STACK[0] = x;
STACK[1] = y;
STACK[2] = 0;
int SP = 3;
int index;
START:
index = X + width*Y;
if (input [index] == 0) RETURN;
if (output[index] != 0) RETURN;
output[index] = labelNo;
if (X > 0) CALL_LabelComponent(X-1, Y, 1);
RETURN1:
if (X < width-1) CALL_LabelComponent(X+1, Y, 2);
RETURN2:
if (Y > 0) CALL_LabelComponent(X, Y-1, 3);
RETURN3:
if (Y < height-1) CALL_LabelComponent(X, Y+1, 4);
RETURN4:
RETURN;
}
void LabelImage(unsigned short width, unsigned short height, unsigned char* input, int* output)
{
unsigned short* STACK = (unsigned short*) malloc(3*sizeof(unsigned short)*(width*height + 1));
int labelNo = 0;
int index = -1;
for (unsigned short y = 0; y < height; y++)
{
for (unsigned short x = 0; x < width; x++)
{
index++;
if (input [index] == 0) continue;
if (output[index] != 0) continue;
labelNo++;
LabelComponent(STACK, width, height, input, output, labelNo, x, y);
}
}
free(STACK);
}
Points of Interest
The described implementation is so simple, that it is easy to customize to your own need, for example like:
- Change it to find the 8-connected components.
- Change data type of input or output image to your needs.
- By
just modifying the 2 lines of code that test for the input image pixel
being zero with a test against a threshold, you will save running
through the image for doing a fixed threshold first.
Apart from being an easy to use and easy to customize algorithm
solving the connected component labelling problem, it also presents a
method to implement a recursive algorithm that has no easy iterative
solution, even though here it would result in stack overflow. The
drawback is that the method is not generally applicable, only in the
cases where the needed stack size is within what normally can be
allocated at the heap.
The CALL and RETURN macros in the
implementation can be made smarter. Giving up on the cross platform and
ANSI C compliance, the GNU GCC compiler allows using labels as values
(the && operator extension). Thus the switch statement in the RETURN macro can be avoided, but at the cost of using pointer sized entries in the stack.
History
- 2014-10-01 First version of article
This member has not yet provided a Biography. Assume it's
interesting and varied, and probably something to do with programming.