Change Detection for Linear Features in Aerial Photographs

�Using Edge-Finding

Neil C. Rowe and Lynne L. Grewe (*)

Code CS/Rp, U.S. Naval Postgraduate School

Monterey, CA 93943 USA; rowe@cs.nps.navy.mil

* - Currently with Computer Science Department, California State University,

Hayward, Hayward, CA 94542 USA, grewe@csuhayward.edu

Abstract

We describe a system for automatic change detection of linear features such as roads and buildings in aerial photographs.� Rather than compare pixels, we match major line segments and note those without counterparts.� Experiments show our methods to be promising for images of around 2-meter resolution.

Acknowledgments: This work was sponsored by the U.S. Navy Tactical Exploitation of National Capabilities (TENCAP) program.

This paper appeared in the IEEE Transactions on Geoscience and Remote Sensing, Vol. 39, No. 7 (July 2001), 1608-1612.

1. Introduction

Change analysis between aerial photographs taken at different times is valuable for analyzing the effects of weather, crop diseases, environmental degradation, and expansion of urban centers [7]. Usually this involves subtracting brightness values, in some spectral range, of corresponding pixels [2] to get a "difference image", perhaps with some mathematical adjustment using various local and global features [5, 8], and presenting this graphically to a user with some coding scheme.� This works well for broad-scale (100-meter resolution) images and large-scale changes such as vegetation growth. However, changes of military significance are often more localized, as the changes to buildings and roads that were of particular concern in the 1990s in Iraq and in aerial surveillance of Kosovo in 1999. Military needs often focus attention on changes to straight-line segments -- their appearances, disappearances, size and orientation changes -- and the shapes which the segments form.� So they need edge analysis and shape-recognition techniques [3, 9, 10].� Military applications cannot assume good weather or lighting conditions, so methods must be robust.� Also, many important changes will be on a 5-meter or 2-meter scale and we will need sufficient resolution.

Our approach is to coarsely register two images relative to one another; normalize the brightness and contrast; find edge cells; extract significant line segments; find the best matches of line segments between the images; calculate the pixel-location mapping function between the images; subtract out corresponding segments and edge pixels; and present the "difference" pixels highlighted on the original images to the human intelligence analyst.� The results can also be used to automatically update terrain databases [1].� Matching of edge segments has less sensitivity to changes in illumination, vegetation, and viewing conditions than pixel subtraction does; it permits comparison of quite different pictures like black-and-white with color infrared, like the U.S. Geological Survey images of the United States which alternate every six years between those formats and do not respond well to direct subtraction.� Matching of edge segments can also be made tolerant of small errors in registration due to shadows, perspective, and changes in vegetation and water, thus reducing effects of the important misregistration problem in pixel subtraction [4].

2. Image normalization

For image differencing, it helps if the images are as similar as possible.� We convert color images to grayscale by summing brightness values over all spectral ranges, and normalize the brightness and contrast.� (Normalization has only a minor effect on edge-finding but helps the analyst viewing the output of processed images.)� For this we filter the first image of each pair in the wavelet domain, followed by a 3 by 3 median substitution on both images.� Wavelet transforms are good for normalization because they permit different adjustments on different image scales.� Our initial wavelet normalization is "global", based on parameters of the entire image.� Specifically, the first image is transformed to the wavelet domain, smoothed, multiplied to equate its mean to that of the second image, and transformed back to the image domain.� We also explored "local" normalization which works successively on small corresponding areas of the images, using our calculated fine-grain registration described below; it requires extra time since segment matching must be done twice, but can help accuracy.

3. Registration of image pairs

We assume two aerial photographs roughly covering the same area, taken at different times with not necessarily the same camera, film, or viewing conditions.� We assume that 50% of their area overlaps, their scale differs by not more than 30%, their orientation differs by not more than 0.3 radians, and their obliqueness (angle from directly overhead) is no more than 0.3 radians.� These limits represent our judgement as to what humans would consider a matching pair of images; they also correspond experimentally to limits beyond which our system is inefficient.� Specifications in the metadata of images often guarantee these limits; or a user can enforce them by explicitly matching points in the images, or we can use a known registration for a neighbor image pair, but this research explored what could be done without such extra help.� This leaves a remaining problem of fine-grain registration, necessary for detecting small changes between the images, and desirable to automate since people find it difficult and tedious.� Many good methods can be used, like matching road intersections [6].� But since we need to compare line segments anyway for change analysis, we match them for registration.

3.1 Finding line segments

We use the Canny edge finder [10] to find "edge cells" or pixels between significantly different brightness values.� We preferred the Canny method since it produced longer connected segments than the other algorithms we tried, which helps in matching.� Edge cells are found twice using two thresholds: one yielding about 10% of the image as edge pixels, and a threshold 25% of that.� The first edge image is used for edge matching, and the second only at the last step to rule out borderline edges, edges above the threshold in one image and below in the other.

To find line segments from edge pixels, we calculate the Hough transform [10] (a mapping from pixel locations to possible line equations containing them) of the 10% edge image.� We weight it by the inverse of the corresponding line length lying within the image boundaries, to compensate for the reduced edge-cell opportunities of lines crossing less of the image.� We find the local maxima of the transform array.� Then in descending order of maxima height, we find edge cells within 1 pixel of the corresponding line, and try to find clusters of 10 or more contiguous cells.� Maxima are only considered if greater than the expected number of edge cells if pixels were distributed randomly.� We then discard non-cluster edge pixels, recalculate the transform on the remaining edge pixels, and search again for line segments.� This second pass can find segments confusable with nearby ones found in the first pass, like the parallel line segments of roads.

For each pixel cluster found, we determine endpoints, orientation , and offset D for the least-squares line.� Then since the sign of the contrast rarely changes along a road or building edge (with exceptions such as with long changing shadows, snowfall, and drought), we redefine �on a range of 0 to �where the brighter region lies to the right in the direction of orientation.

3.2 Segment-to-segment matching

We now do fine-grain registration of the two images by matching their edge segments.� However, several matches to the same segment may lie within the coarse-registration tolerance since ground structures often have parallel edges.� Fortunately, several clues can filter out incorrect matches.

We first find good line-segment matches of the 100 longest segments in each of the two images.� The assumption that image rotation is less than 0.3 radians rules out around 95% of possible matches.� We also rule out matches differing more than 2:1 in length; this rules out around 80% of the remaining matches, 90% of the discarded matches being indeed invalid.� Finally, we superimpose the images and measure two factors for each segment pair, "parallel distance" (the distance of the midpoint of the first segment from the line containing the second segment) and "collinear distance" (the distance between the projections of the segments onto the line through the second segment, defined to be negative if the projections overlap).� We limit both factors to 16% of the average length of the diagonal of the two images for this initial matching, ruling out about 68% of the remaining segment matches.� If necessary, we exclude the edges with the most possible matches to reduce the total number of matches to no more than 100, since such edges are the most ambiguous.

Figure 1 shows two example edge images.� Edge 1 of the left image can match either 11 or 13 of the right image, but not 12 because of its orientation.� Edge 6 cannot match edge 12 because it is more than twice as long.� Depending on the accuracy of the superposition and the thresholds, edge 13 can be matched to 3 or 5 but probably not 1 by the parallel distance criterion, and probably not 4 by the collinear distance criterion.

3.3 Matching groups of line-segment matches

We now filter segment matches by considering their consistency.� We start with pairs of segment matches (that is, four segments in two matched pairs).� Correct segment matches will appear in more of the reasonable pairs than incorrect matches will.� We require the rotation angle be consistent within 0.1 radians for a pair; this rules out about 83% of the pair-pair matches.� We also rule out matches with inconsistent scales or angles for their segment pairs, which rules about another 20% of the pair-pair matches.� We then select the best 25% of the pair-pair matches with weighted criteria including segment lengths (pairs with longer segments are more reliable), length consistency, and the distance between the segments in each image (segments further apart provide more accurate estimation of parameters).

Then we consider possible sets of three match pairs ("triples") built from three identified pairs.� We can fit a unique 4-parameter linear mapping from the equations of the six lines through the six segments, avoiding reliance on the possibly inaccurate endpoints: �and .� Here �and �are translational offsets, s is the scale factor, and �is the image rotation.� This assumes images at the same viewing angle, but otherwise the error is usually small, and this assumption is relaxed anyway in the next step.

With this, the line �is mapped to �where �and .� Given three segment-match pairs, we estimate the rotation as average of the three matched orientation differences; substituting this in the line equation yields three linear equations in , , and s.� We solve these, and eliminate triples with fits outside the acceptable ranges of translation, rotation, and scaling.� We also superimpose the segments using the mapping function, and eliminate triples whose collinear distance is nonnegative.� A good triple for Figure 1 would be 1-11, 2-12, and 3-13 because the segments are close enough and different enough in orientation and size to provide good constraints.� Now since we usually have weeded out most incorrect segment matches, we eliminate triples with atypical mapping functions, ones with rotation and scale factors more than 0.1 from the median.� These criteria together eliminate 99.9% of the triples on the average.

We now combine triples to obtain sextuples of matched segments.� This permits a least-squares fit to the general linear mapping between two planes, which can handle arbitrary oblique images: and .� We impose reasonability constraints on matches that 0.7 <a < 1.3, 0.7 <e <1.3, -0.3 < b < 0.3, �-0.3 <d < 0.3, 0.1 < a-e, 0.1 < b-d, and that c and f must be smaller than the half the larger dimension of the corresponding image.� This typically eliminates 99% of the possible sextuples, but there is much variability.

It now likely that the best sextuples correctly match the images. Thus we average the fits for the 200 best-fit sextuples to get a consensus mapping function.� We then map all (not just 100) segments of the first image to the second image using it.� For each segment of the first image, the best possible segment match in the second image is found that obeys the constraints on orientation, size, parallel distance, and collinear distance.� We refit the six parameters using these matches to get our final mapping function.� Optionally, this can be used to better normalize the images by local correction with the wavelet transform.

4. Calculating the edge-difference image

Now we can use our mapping function and segments to create two "edge-difference images", binary images showing the edge pixels that do not correspond to edge pixels in the other image.� To obtain them, we filter edge pixels instead of segments:

1. Edge pixels in either image mapped outside the boundaries of� the corresponding image are eliminated.

2. All pixels of edge segments matched to an edge segment of the other image are eliminated.

3. Edge pixels that map within a pixel of a segment of the other image, or to an edge pixel at the lower (25%) threshold mentioned in section 3.1, are eliminated.� This helps eliminate pieces of broken edges.

4. The remaining edge pixels for each image are clustered into sets of contiguous pixels.� Clusters of less than a minimum size (currently 2) are excluded.

The remaining edge pixels comprise the "on" cells of the edge-difference image.� They are highlighted on the two images and presented to the user for inspection, superimposed in dark black or red on the whitened original images.� A human expert should be able to see patterns of change at a glance.� Figures 2 and 3 show example output for an industrial park in Monterey, California, USA between 1987 (color infrared film) and 1993 (conventional black-and-white film), the first pair listed in Figure 4.� Parts (but not all) of new buildings and cleared areas were found and large vegetation changes were noted.

5. Experimental results

Our program is around 1000 lines of source code primarily in the Matlab language, with wavelet normalization in Java.� Processing averaged 12 minutes for a 256 by 256 image on a Windows NT workstation with a processor having a 2 nanosecond cycle, reasonable for this mostly-uncompiled program; roughly equal time is spent on each of segment finding, fine-grain registration, and building the edge-difference images.� Tables 1 and 2 give processing statistics on representative image pairs extracted from U.S. Geological Survey photographs, where the earlier image was color infrared and the later was conventional black-and-white.� The nonlinearity in the registration mapping averaged 3 pixels for these pairs.� Note that a segment can initially match more than one counterpart.

We measured performance by estimating precision and recall by randomly sampling the output.� Precision was defined as the ratio of the number of "difference pixels" ("on" pixels of the edge-difference images) correctly identified to the number of difference pixels identified; recall was defined as the ratio of the number of difference pixels correctly identified to the number of correct difference pixels.� (Counting pixels rather than segments is better because longer segments matter more than shorter segments.)� Average precision was 0.59 and average recall was 0.54 over the ten tested image pairs with around 8300 correct difference pixels found. In contrast, precision was 0.04 and recall was 0.10 for a control algorithm that eliminates edge pixels matching within a pixel length for the same mapping function, and performance was further worse when manual registration results were used.� Our output should save time for intelligence analysts by focusing their attention.

6. References

[1] W. Baer, L. Grewe, and N. Rowe, "Use of image feedback loops for real time terrain feature extraction," IS&T/SPIE Symposium on Electronic Imaging '99, January 1999.

[2] M. Carlotto, "Detection and analysis of change in remotely sensed imagery with application to wide area surveillance," IEEE Transactions on Image Processing, vol. 6, no. 1, pp. 189-202, January 1997.

[3] R. Collins, C. Jaynes, Y. Q. Cheng, X. Wang, F. Stolle, E. Riseman, E., and A. Hanson, "The Ascender system: automated site modeling from multiple aerial images," Computer Vision and Image Understanding, vol. 72, no. 2, pp. 143-162, November 1998.

[4] X. Dai and S. Khorram, "The effects of image misregistration on the accuracy of remotely sensed change detection," IEEE Transactions on Geoscience and Remote Sensing, vol. 36, no. 5, pp. 1566-1577, September 1998.

[5] K. Greene, B. Kempka, and L. Lackey, "Using remote sensing to detect and monitor land-cover and land-use change," Photogrammetric Engineering and Remote Sensing, vol. 60, no. 3, pp. 331-337, March 1994.

[6] J. W. Hsieh, H. Y. Liao, K. C. Fan, M.-T. Ko, and Y. P. Hung, "Image registration using a new-edge based approach," Computer Vision and Image Understanding, vol. 67, no. 2, pp. 112-130, August 1997.

[7] T. Lillesand and R. Kiefer, Remote Sensing and Image Interpretation.� New York: Wiley, 1987.

[8] J.-.F. Mas, "Monitoring land-cover changes: a comparison of change detection techniques," International Journal of Remote Sensing, vol. 20, no. 1, pp. 139-152, 1999.

[9] B. Nicolin and R. Gabler, "A knowledge-based system for analysis of aerial images," IEEE Transactions on Geoscience and Remote Sensing, vol. GE-25, pp. 317-329, May 1987.

[10] J. C. Russ, The image processing handbook, second edition. Boca Raton, FL: CRC Press, 1995.

Figure and Table captions

Figure 1: Example line segments extracted from aerial photographs in 1987 and 1993 of Salinas, California, USA.

Figure 2: Example output for a 1987 photograph of Monterey, California, USA; dark black points represent edges present in this image but not the 1993 one.

Figure 3: Corresponding output to Figure 2 for a 1993 photograph of Monterey, California, USA; dark black points represent edges present in this image but not the 1987 one.

Table 1: Statistics on the test image pairs: Pair is image pair number, Res is resolution, and Rows and Cols indicate the number of pixels per row and per column of each image.

Table 2: Statistics on processing of test image pairs.� Pair is image-pair number, Segs is number of line segments found in each image, Step 1 shows the number of initial segment matches, Step 2 the number of pairs of matches, Step 3 the number of triples of matches, Step 4 the number of sextuples of matches, Step 5 the number of segment matches after fitting with the sextuple average, Final the final number of matches, Prec (precision) is the ratio of difference pixels correctly identified to those identified, and Rec (recall) is ratio of difference pixels correctly identified to all correct difference pixels.

1

2

4

3

5

6

11

12

13

Figure 1: Example line segments extracted from aerial photographs in 1987 and 1993 of Salinas, CA, USA.

Figure 2: Example output for a 1987 photograph of Monterey, CA, USA; dark black points represent edges present in this image but not in the 1993 one.

Figure 3: Corresponding output to Figure 2 for a 1993 photograph of Monterey, CA, USA; dark black points represent edges present in this image but not in the 1987 one.

Pair	Type	Res	Rows1	Cols1	Rows2	Cols2
A	Semi-urban	3m.	225	273	218	288
B	Urban	1m.	304	327	304	352
C	Semi-urban	3m.	333	320	344	337
D	Urban	3m.	266	318	233	298
E	Urban	2m.	330	294	313	303
F	Urban	2m.	419	278	443	266
G	Rural	1m.	213	286	207	273
H	Semi-urban	5m.	228	311	255	266
I	Rural	1m.	227	212	254	202
J	Semi-urban	3m.	340	258	277	238

Table 1: Statistics on the test image pairs: Pair is image pair number, Res is resolution, and Rows and Cols indicate the number of pixels per row and per column of each image.

Pair	Segs	Step 1	Step 2	Step 3	Step 4	Step 5	Final	Prec	Rec
A	322 260	260	635	51	28	37	92	0.7	0.4
B	282 352	190	1341	898	11	50	117	0.8	0.5
C	449 392	163	918	423	2305	71	170	0.6	0.6
D	358 253	282	533	43	17	25	55	0.3	0.8
E	492 598	249	996	495	524	73	177	0.5	0.7
F	383 364	320	1269	442	100	79	182	0.6	0.8
G	234 141	259	635	13	5	14	33	0.5	0.2
H	254 264	306	825	65	35	32	88	0.6	0.3
I	254 221	294	933	440	2388	74	154	0.6	0.7
J	385 246	283	1002	493	1379	86	193	0.7	0.4