This action might not be possible to undo. Are you sure you want to continue?

in Low Frame-rate Video

Jared Friedman

January 14, 2005

Final Project Report

Computer Science 283

1

Abstract

In this paper, I present a novel approach to estimate mean traffic speed using low

frame-rate video taken from an uncalibrated camera. This approach takes advantage of a

known relationship between traffic speed and traffic density to make tracking of

individual vehicles unnecessary. The algorithm has been developed especially for

nighttime conditions, though extensions to daytime images seem quite possible. It has

been tested on several image sequences and shown to produce results consistent with

human estimations from those sequences.

Introduction

Computer vision techniques have been applied to images of traffic scenes for a

variety of purposes [1]. One of the more popular of these purposes is to attempt to extract

a measure of the level of congestion of the road in the scene. Properly disseminated, this

information can be used by drivers to plan routes that avoid traffic and by first responders

to identify accidents. The measurement typically used for congestion is the mean speed

of traffic, as this is the measurement a rational traveler should care about. In this paper, I

present a new algorithm to estimate mean traffic speed using video images at low frame

speed. The work is motivated by the presence of Trafficland, a new company that offers

video streams from over 400 traffic cameras in the Washington D.C. area through free

internet access (at www.trafficland.com). Previous work on finding traffic speed has

worked by finding it directly, essentially by tracking vehicles for a known time over a

known distance and calculating an average ratio. However, due to bandwidth limitations,

Trafficland’s cameras give video at less than one frame per second, with unreliable and

difficult to determine time intervals between frames, making tracking extremely difficult

if at all possible. Some published algorithms [2],[3] instead place two virtual lines or

“tripwires” on the road at a known separation and measure the time interval between cars

crossing the first and crossing the second, in the natural computer vision analogue of the

physical loop detectors on roads. While this may seem to be a different technique from

tracking, it shares many similarities, including the assumption that cars will not travel

much from one frame to the next, and in practice it requires an even higher frame speed

than tracking.

Nevertheless, clearly humans are able to judge traffic levels from Trafficland’s

video, and they would still be able to do so even if shown only every two or three frames,

making tracking literally impossible. I assert that the way we make this judgment is by

determining how closely spaced the cars are and using the intuitive fact that closely

spaced cars tend to travel more slowly than tightly packed ones. More precisely, we use

the inverse correlation between mean traffic speed and traffic density, which is defined as

vehicles per lane-mile [4]. The approach of this paper is to take advantage of this

relationship and compute density directly, which is easier to compute at low frame

speeds, and convert this into a speed using the known relationship.

To compute density in a particular region of interest, we must know the number of

lanes of the region, the length of the region, and the number of cars in the region in each

2

**frame. Some traffic vision systems have had to accommodate cameras that could be
**

rotated and zoomed by traffic operators with joysticks (e.g., [1], [5]), and thus have had to

build in some automatic calibration capability to their programs. However, Trafficland’s

cameras appear to be stationary, and we make the assumption that the number of lanes

and the length of the region need only be determined once, and take advantage of human

input in this one-time low-cost setup procedure. Many published and commercial

systems [6] [7] also require some initial human setup: for one, if there are multiple roads

or two directions of a single road in the picture (as is usually the case), the software

cannot possibly know which road is the intended one without some human input.

Specifically, the initial calibration setup simply requires a human to draw a rectangle (in

world coordinates) around an area of interest and then to trace out the lanes in the region.

Using some simple geometric constraints and assuming a typical lane width, the length of

the region can then be calculated.

The calculation of the number of vehicles in the region proved to be more

challenging than expected. This partially because surprisingly little previous work has

been done on the problem. Several algorithms have developed excellent tracking of cars

in daytime conditions at high frame speeds, which implies that they are able to recognize

vehicles to some extent. However, in tracking vehicles directly, it is not necessary to

segment cars properly, but only to identify blobs that correspond to multiple vehicles or

parts of vehicles, since in a traffic stream all the vehicles, their parts and their shadows

tend to move at about the same velocity. The algorithm reported in [1] does require

correct segmentation of vehicles, because it must estimate their size correctly, but it

requires correct segmentation of only a few vehicles at a time, and thus it simply throws

away any blobs that do not correspond to a tightly defined vehicle profile. Accurate

counting of vehicles in daytime conditions requires a more sophisticated approach to deal

with occlusion, shadows, and vehicles of widely varying appearance. In this preliminary

report, I chose to focus on nighttime conditions only and to leave daytime conditions for

future work. Nighttime conditions are easier because at night, it is usually possible to

simply count the number of headlights appearing in the region, and headlights are much

more visible and less vulnerable to occlusion, shadow, and varying appearance than cars.

Nighttime conditions are in any case a more suitable potential use of the algorithm

advanced in this paper, since tracking-based systems ordinarily find daytime conditions

much easier than nighttime conditions, giving a the density approach a particular

advantage in these conditions.

This paper first reviews the key assumptions of the algorithm and discusses their

validity and which ones could be relaxed in further work. I then discuss in detail the

workings of the algorithm and follow by some considerations of its computational

efficiency. I conclude with empirical results validating the accuracy of the algorithm.

Underlying Assumptions

The following gives a list of the key assumptions used to simplify the problem

and some discussion of their validity.

3

**1) Images are taken at night. Headlights are the brightest objects in the region of interest.
**

2) Traffic is moving generally towards the camera, but not (almost) directly into it. The

second requirement exists because when traffic is going almost directly towards the

camera, the glare from the headlights creates bloom, lens flares, and severe distortion.

The first requirement exists because if the traffic is moving away from the camera, the

headlights will not be directly visible. In my opinion, the second of these twin

requirements is much more reasonable than the first. Many of Trafficland’s nighttime

images are so severely distorted by the lens flares that it is nearly impossible even for a

human to determine the amount of traffic, and working with these images would be a real

challenge. However, tracking cars going away from the camera will obviously be

necessary for a fielded system, and future systems could use either the rear vehicle lights

or the fairly bright reflected glare from the headlights to accomplish this.

3) Vehicles are confined to the road plane, and there exists a region of interest with

straight edges. Also, the number of lanes in the region of interest is constant. These

requirements are necessary for the calculation of the geometry of the situation.

4) The width of each lane in the picture is approximately 11.5 feet. This assumption is

the distance required to determine the scale of the image and thus the length of the region

of interest. The validity of the assumption is taken from [8] which states that virtually all

American highways have lane widths between 10 and 12.5 feet at all times, with lane

widths close to 12 feet being the most common. Other systems have used a variety of

means to attempt to produce a scale measurement, mostly by placing physical marks on

the road [9], [10], although [1] did so by assuming a known distribution of vehicle

lengths. However, in this situation it was impossible to have an operator placing marks

on or near the road. Estimating by mean vehicle length requires an algorithm that can

accurately determine vehicle length of all vehicles, including trucks, which is difficult to

construct and must be run over a considerable period to get an accurate mean value.

Furthermore, it is not at all clear that this mean vehicle length is more constant from road

to road than the mean lane width. [11] reports evidence that the mean vehicle length

changed considerably depending on the time of day, the highway, and the lane observed,

primarily due to the considerable variation in the presence of large trucks, leading to large

errors in systems that assumed a constant vehicle length. Using the lane width as a

calibration tool appears to be a novel suggestion, and it seems a sensible choice for a

variety of situations, not limited to low-frame rate video. It is perhaps worth noting that

if the lane width did not hold to the 10-12.5 foot range, then the validity of assumption

(5) would be in question anyway, as this would affect the density-speed relationship.

5) Traffic speed and density have a known, and constant relationship, specifically Edie’s

model as given in [4]. This assumption is admittedly somewhat controversial. While the

inverse correlation between speed and density is obvious, the exact relationship has been

a topic of considerable debate. For decades, it was believed to be a linear relationship on

the basis of a single study using seven data points all collected from a single highway [4].

Further study by Greenberg found that a logarithmic relationship was the best fit, as in

Fig. 1. However, despite the seemingly excellent fit, a number of caveats can be raised

4

**with Greenberg’s methods, and several later studies found that Greenberg’s relationship
**

was only a mediocre fit to their data. The modern favored choice for the relationship is

Edie’s hypothesis, which is a piecewise function shown in Fig. 2. The piecewise

relationship has not only be confirmed by a rigorous study into the matter [12], it also fits

well with theoretical models of traffic flow, which invariably divide the problem into at

least two subcases corresponding to free-flow and congested-flow, if not more. Despite

all the debate as to the precise nature of the relationship, the actual difference for this

application between the initial Greenshields model and the most recent Edie’s model is

only at most 10-20%. However, the data for these studies appears to have been collected

only during the daytime and only in normal weather conditions. How nighttime

conditions and adverse weather might affect the relationship is quite unknown.

Fig. 1. Greenberg’s speed-density hypothesis, plotted with his data.

Fig. 2. Eddie’s speed-density hypothesis, plotted with his data.

5

**Finally, it is important to note that there is another relationship between traffic
**

variables that could be useful in further study. The relationship is between traffic speed

and traffic volume, which is defined as cars per lane-hour. I chose not to use this

relationship in my algorithm because the relationship between volume and speed is

considerably less well-established than the one between speed and density. However,

calculating volume does not require finding the length of the region of interest, and thus it

is immune from that source of error. If the expected level of error from the estimation of

the length of the region were greater than the expected error from the estimation of the

speed-volume relationship, then it would be a reasonable choice to abandon density and

instead calculate volume. Determining whether this is in fact the case unfortunately goes

beyond the scope of this paper, but if it were, then the change would be quite easy to

make. The part of the algorithm that calculates the length of the region would simply be

dropped, and instead the number of cars counted by the second part of the algorithm,

divided by the product of the time period of operation and the number of lanes, could be

plugged into the function in [4] that computes an expected relationship between volume

and speed.

Algorithm Operation

This section details the workings of the algorithm. The first part explains the counting of

cars, and the second part explains the camera calibration and determination of the region

length.

I. Counting Cars

The algorithm operates on a sequence of nighttime images, sampled at virtually

any frame rate. The images in the dataset are originally in color, but they are converted to

greyscale for analysis. For nighttime images, there is normally very little useful

information in the color channels, and what information there might have been is

obscured by the terrible color distortion in Trafficland’s cameras. The car counting part

of the algorithm assumes that an operator has drawn a rectangular region of interest and

that we are counting cars only in that region.

The car-counting algorithm operates essentially by counting headlights. Unlike

other papers [13], I do not assume a particular shape for headlights, nor do I require that

each vehicle have exactly two nearly identical headlights. While those assumptions are

often valid, occlusion, reflections on the road and on the vehicles, and varying headlight

configurations complicate the picture. Instead, I merely assume that each vehicle has one

or more brightly colored dots on or right next to it. The algorithm finds those dots and

attempts to determine which dots belong to which vehicles.

6

**Fig. 3. A typical unprocessed Trafficland image
**

The first step of the algorithm is to crop the image to the smallest size that

contains the region of interest, and then to set all the pixels outside the region of interest

to zero intensity. The image is then converted to greyscale. A typical image at this stage

is shown in Fig. 3. The image is then top-hat filtered. Top-hat filtering is a technique

used to smooth out uneven dark backgrounds. Top-hat filtering is defined by subtracting

the result of performing a morphological closing on the input image from the input image

itself. This has the effect of reducing background noise by eroding it away, and thus

producing a more even, clean background. Results of top-hat filtering are shown in Fig.

4. Top-hat filtering requires a choice of a structuring element for the morphological

closure. The choice of a disk shape was easy – this is standard. The choice of the size of

the disk was more difficult and also somewhat arbitrary. Examination of the scale of a

few Trafficland cameras showed that a disk radius of 10 pixels gave good results. Some

further experimentation showed that the performance of the algorithm was highly nonsensitive to changes in this size.

7

**Fig. 4. Image after top-hat filtering.
**

The next step is to choose a threshold and convert the greyscale image into black

and white. Choosing the threshold is obviously the difficult part. If there are headlights

in the image, then Otsu’s method, which chooses a threshold to minimize the intra-class

variance, works very well. This is essentially because in this case, the image histogram

will be strongly bimodal between headlight and not-headlight, and this method will easily

find the dividing point. However, in an image that has no headlights, Otsu’s method will

return terrible results, as it will cause a segmentation of the road itself based on random

noise on the road, but in this case we want it to segment to all black pixels. One way to

solve this problem is simply to set a fixed parameter that represents a minimum

reasonable headlight intensity, and to take the threshold to be the maximum of this

intensity and the threshold computed by Otsu’s method. In practice, this method will

return good results with almost all images, as the difference between the road

background, generally 0 to .3 in intensity, and the headlight intensity, usually .9 to 1, is so

extreme that any parameter value choice of .4 to .7 will correctly separate them, and the

choice of the parameter within this region will have little effect on the quality of the

segmentation.

In the expectation this method might not return optimal results for all images, I

pursued a method that would learn the threshold from previous images. The algorithm

begins using the fixed parameter method with a low fixed parameter value. A record is

kept of all the thresholds determined by Otsu’s method during the past 100 images –

including those determinations when the value was not used as the actual threshold

because it was too low. To this vector, Otsu’s method is itself applied again, to separate

the vector into values that were found during no vehicle presence and ones that were

found during an actual vehicle presence. If there was at least one image with headlights

and one image without headlights, this method will return good results. Once again, the

problem can occur with trying to separate a non bimodal distribution. I make the

assumption that in at least one of these images, a car was present. To test to see whether

all the images have headlights in them, I do a 2-sample t-test, comparing the values below

the computed threshold to the values above the computed threshold. If the difference is

8

**significant, then there are likely two different distributions in the data – one with
**

headlights, and one without headlights, and I set the minimum parameter to the threshold

computed on the vector of 100 previous thresholds. If the difference is not significant,

then probably all of the images had cars in them, and I keep the old threshold. This

assumes that the values of the grey threshold computed when there are cars in the picture

and when there are not cars in the picture both have normal distributions, and I have

found this to be a fairly accurate assumption, as confirmed both visually and by the

Lilliefors normality test. Fig. 5 shows a histogram of the thresholds determined in a 200

image sequence with a highly bimodal distribution due to the presence or absence of

headlights. I found this method to accurately determine when there were actually cars

present in the picture, and to choose a grey threshold accordingly that reflected the

lighting conditions of the image – e.g., images with brighter backgrounds had a higher

threshold. However, due to the non-sensitivity of the rest of the algorithm to the value of

this parameter, this method failed to return significantly better or even significantly

different results on the actual dataset from the simple-minded hard-coded parameter

method.

**Fig. 5. Histogram of 100 threshold values determined by Otsu’s method.
**

Having converted the image into black and white, the next step is to identify cars

from the white blobs that may correspond to headlights or reflections. A typical image at

this stage is shown in Fig. 6. As discussed before, other work [13] has attempted to use

prior knowledge about headlight shape to accomplish this segmentation. Unfortunately,

they give virtually no details about their algorithm, so it was not possible to reproduce

their results. After a considerable amount of experimentation with template-based

matching, the technique used in [13], I decided that these assumptions about headlight

shape were not valid enough in general to be useful, and instead I use a simpler method

9

**that relies only on the assumption that all the headlights and headlight reflections on a car
**

will be close. First, the binary image is dilated, which tends to connect the unconnected

blobs belonging to a single car. Unfortunately, this simple technique runs the risk of

joining blobs of adjacent cars incorrectly, leading to undercounting the actual number of

vehicles. To mitigate this problem, I use the fact that the user has drawn segments

corresponding to the lanes in the region in the initial setup and I separate blobs along

those lane boundaries. Assuming that all cars are entirely in a lane, this essentially solves

the problem of connection across lane boundaries, and leaves only the potential problem

of connecting two cars within a lane. But since the headlights of cars in a single lane are

separated by dark car bodies, this is rarely a problem. Simply counting the blobs found at

this stage gives a reasonable result, but I do one further step of noise-reduction that

improves performance further. Since all headlights should by now have been joined

together in blobs of considerable size, I eliminate all blobs below a certain threshold size,

since these usually correspond to small reflections on or around vehicles that have already

been counted. The size chosen is a volume of 15 pixels, which is an extremely

conservative estimate for the size of a headlight, especially after dilation. Rather than an

assumption about the size of the headlights in the images, it is best considered as a

constraint on the choice of the region of interest by the traffic operator, requiring the

region to be placed close enough to the camera so that dilated headlights have a volume

of more than 15 pixels. Indeed, if this is not the case, the rest of the algorithm is unlikely

to perform well anyway, as the resolution will be very poor.

**Fig. 6. Black and white segmentation.
**

Dilating the image in the above step requires a choice of a structuring element,

and this choice is best determined in a principled manner, as the performance of the

algorithm is considerably affected by it. In images with small headlights, like the ones

shown in the above figures, dilation is not necessary, though a small amount rarely hurts.

In images with a closer view of the traffic, dilation becomes essential. The shape of the

structuring element is simply a disk, as is standard. The idea behind my method of

choosing the radius of the disk is that the disk needs to be large enough to join headlight

10

pairs but should not be much larger, else it will run the risk of joining together different

cars. The algorithm for calculating this size depends on some assumptions about

headlight size taken from [14] and also informally observed in Trafficland’s images. The

key assumption specifically is that the average distance between the headlights is

approximately proportional to the typical headlight size, as recorded by the image. One

time the assumption is clearly false is when the cars are traveling almost directly towards

the camera, as then the distortion will make the headlights seem much larger. Another

time the assumption does not work well is when the traffic is traveling almost directly

across the camera’s field of view; however, in this case, some dilation is still useful in

connecting cars with the glare reflection, which will be particularly prominent. In

between these two extremes, however, the constancy of this ratio is good enough that

setting the size of the structuring element to be a constant times the estimated headlight

size returns excellent results. The correct ratio is difficult to determine exactly, but it is

close to one; the value I use is 1.3. I measure the size of the headlights by finding the

median area of all the blobs found in the first 100 frames, and calculating the

corresponding radius of a circle of this area.

II. Determining the Region Length

The algorithm I use to determine the region length is taken from [15], adapted to

the information available for the scene. First, the traffic operator gives the initial set-up

information pictured in Fig. 7. This includes a region of interest, whose projection onto

the road plane must be rectangular in the world coordinates and a trace of all the lanes in

this region of interest. For good performance, the region of interest must be a straight

section of road, it should begin as close to the camera as possible, and it should not

extend so far that the resolution at the end of the region is too poor (see above for a more

precise definition).

Fig. 7. What the traffic operator draws.

11

**The estimate of the region length begins with computing a camera calibration
**

from the data that the operator has given. The camera calibration can then be combined

with some simple geometry to estimate the region length. The camera calibration

technique described in [15] is easier for the traffic operator than the one described in [7],

which requires that a grid evenly spaced along the road axis be determined by the

operator, which in practice is a difficult judgment for a human to make. I do not repeat

the detailed derivation of the algorithm in [15] here, but I give an overview of its

operation as applied to this situation. Most of what follows is taken from this paper; for

brevity, I omit the exact citations.

Camera calibration involves finding a camera’s intrinsic and external parameters.

First consider the intrinsic parameters. Recall that they can be represented by the matrix

α u − α u cot θ u 0

αv

v0

0

sin θ

0

0

1

To simplify the calibration process, I make several assumptions about the internal

parameters which are approximately true for most cameras and common in computer

vision applications. First I assume that the axes are in fact perpendicular, so that, θ= 0. I

also assume that the horizontal and vertical focal lengths are equal, and that u0 and v0, the

coordinates of the camera center are actually at the image center. This reduces what was

previously a five parameter problem to a one parameter problem, leaving αu as the only

unknown.

To calculate the external parameters, we first calculate a vanishing point. The

vanishing point that can be calculated most accurately is the one in the road direction.

We could use only the region of interest boundaries to calculate this point, but we will get

better results if we also use the tracings of the lanes the user has made. Specifically, we

wish to find the point whose sum of squared distances to all the lines is a minimum. This

least squares estimate can be easily determined by solving a system of linear equations of

the form Mx = b. If there are n lanes on the road, then M and b will each have (n+2)

rows. Let Li be a unit vector in homogeneous coordinates representing the direction of

the ith line (out of the n+2), and let Pi be a point on that line (in homogeneous

coordinates), then we can define the ith row of M and b as follows:

M i = [ − Li 2 − Li1 ]

bi = ( Li × pi ) [ 0 0 1]

T

T

**Then the vanishing point x is simply the pseudo-inverse of M times b. The vanishing
**

direction can be computed as A-1x, where A is the camera intrinsic parameters matrix

(which is not yet entirely known).

We can describe the world coordinates in terms of three axes: Gx, which is

perpendicular to the vanishing direction, Gy, which is parallel to the vanishing direction,

12

**and Gz, which completes the coordinate system. Let v denote the normalized vanishing
**

direction. Let ϕbe the roll angle about the vanishing direction, and define β= 1/(1+vy).

Then we can determine the three axes in terms of these variables, only two of which are

unknown. [15] provides the expressions for Gx and Gz (the expression for Gy is trivial);

unfortunately, the expression for Gz contains several apparently typographical errors. The

correct expressions are:

(1 − β v x 2 ) cos ϕ + β v x v z sin ϕ

Gx =

v z sin ϕ − v x cos ϕ

− (1 − β v z 2 ) sin ϕ − βv x v z cos ϕ

(1 − β v x 2 ) sin ϕ + βv x v z cos ϕ

Gz =

− v x sin ϕ − v z cos ϕ

(1 − β v 2 ) cos ϕ − βv v sin ϕ

z

x z

T

T

**If these axes were known (currently they are written in terms of two unknowns), we
**

would then be able to use the axes, our internal parameter matrix, and some geometry to

calculate distances on the road plane. Specifically, this can be done in the following

manner. Given an image point x, compute its projection p = A-1x. This is of course a

vector in the direction of the ray that goes from the camera center to the point x. But the

intersection of this ray and the road plane is

Gz 2

pˆ

pˆ ⋅ G z

Given two such projections P1 and P2, the distance between them is simply ||P1 – P2||, up

to some unknown but constant scale factor.

P=

Of course, all of this assumes that we have values for the two unknowns αu and ϕ. The key

insight of [15] is that with a knowledge of the ratios of lengths in the picture, we can use

a non-linear optimization process to solve for those two unknowns. For the Trafficland

situation, say again that we have n lanes. Then we know of n+2 segments in the direction

of the road that must be of the same length in the world. Also, all the 2n+2 segments

perpendicularly connecting the lanes at the beginning and end of the region of interest

must be of the same length. Let us denote the n+2 road-parallel segments as q0, …, qn+1 and

the 2n+2 perpendicular segments as s0, …, s2n+1. The residual I compute is a modification

of the one in [15] and is defined by:

2

q

2 n+1 s

r = ∑ 0 − 1 + ∑ 0 − 1

1 qi

1 si

n +1

2

13

**A non-linear optimization process can then be used to solve for the αu and ϕthat minimize
**

r. [15] recommends the Levenberg-Marquadt method, but I use a subspace trust region

method based on the interior-reflective Newton method, as some informal

experimentation showed that this algorithm was much less likely to converge to incorrect

local minima when given an initial value distant from the correct one. Finally, the scale

factor can be easily determined by assuming the lane width as stated above and dividing

the actual lane width by the average of the computed ones.

Some Considerations of Computational Efficiency

The algorithm essentially has two parts, the initial setup and calibration, and the

counting of vehicles in actual operation. Obviously, the efficiency requirements of the

two are quite different. The part of the algorithm that operates in real-time must be

highly efficient; however, the initial setup, which only happens once, is not nearly so

constrained.

Virtually the entirety of the computational time for the initial setup is consumed

by the nonlinear optimization process, which must compute a relatively computationally

intensive function many times to find a good minimum. The function that it computes is

constant time with respect to the size of the image, but it contains several matrix

inversions and a good deal of matrix multiplication and arithmetic operations. Typical

running time for the nonlinear optimization process to complete is about ten seconds on a

Pentium 4. Considering that it will take the operator significantly longer to draw the lines

on the image that are used in the calibration process, this seems within acceptable limits.

The part of the algorithm which must operate in real time is the car counting.

Empirically, this algorithm is highly efficient, requiring only about .2 seconds per frame,

whereas the frames occur at less than one frame per second. This implies, assuming no

frames are dropped, and indeed it would be possible to drop frames without affecting

performance significantly, that each computer could process the video feeds from 5

cameras simultaneously, which is superior to most published algorithms, which are

usually able to handle only one camera [1]. Part of the reason for the high efficiency

comes from the small size of the images that are effectively being worked with. The

original traffic image is 320x240. But most of this is background, and the region of

interest size is typically on the order of 100x100. Profiling the execution of the algorithm

using the excellent profiler tool in MATLAB showed that the algorithm spent 64.4% of

its time directly computing morphological operations of some kind. When the time to

check arguments, resize matrices and execute other miscellaneous utility functions

connected with the morphological operations is taken into account, the actual percentage

of the time spent doing morphological operations is between 80% and 90%. Virtually all

of the rest of the CPU time is spent resizing the image and converting it to greyscale.

Computing the threshold using Otsu’s method takes only 2.0% time. The top hat

transformation at the beginning is particularly computationally intensive (55%), as for a

circular structural element, the computational complexity of the morphological closing is

proportional to the area of the circle, which is rather large. This may indicate that in

situations where computational efficiency is at a premium, a smaller structural element or

one of an easier shape should be substituted, though this was not investigated.

14

**The memory requirements of the algorithm are very modest. Since each image is
**

processed individually, only the data to process that particular image must be stored. In

my implementation, that is approximately 4 times the memory requirement of the image

cropped down to the region of interest, because temporarily we must store the original

image, the top-hat enhanced image, the segmented image, and the dilation of the

segmented image. The other major memory requirement comes from storing the sizes of

the headlights of the past 100 frames. In a fielded system, this would really not need to

be calculated every frame; instead it could just be re-calculated every few thousand

frames. However, during its calculation it takes a matrix of about 1000 elements to store

it, assuming about ten blobs per image. The other variables are independent of the size of

the images and very small.

Empirical Results

The gold standard in the empirical validation of computer vision speed detection

algorithms is simultaneous inductance loop data. Inductance loops are wires run under

the highway and connected to electrical monitoring equipment in such a way that a clear

and easily measurable electrical impulse occurs when an axle rolls over the wire. By

building two inductance loops close together at a known distance, the speed of traffic on

the highway can be measured very accurately. If the images being analyzed by a

computer vision algorithm have simultaneous loop detector data, the algorithm’s results

can be compared with the known good speeds from the inductance loops and the

algorithm validated with high accuracy.

Unfortunately, no such simultaneous data is publicly available. Without it, a total

verification of the algorithm’s accuracy is impossible, but significant confirmation is still

possible. Recall that the accuracy of the entire algorithm rests on the accuracy of three

components: the counting of the cars, the determination of the size of the region, and the

relationship between speed and density; if all three of these are correct, then the speed

estimates produced must also be correct. The third of these is impossible for me to test,

but it has been verified by numerous studies into the matter, and so it is reasonable to

assume its accuracy. The second of these is also nearly impossible for me to test

accurately. However, I can say informally that the algorithm returns results within some

reasonable bound –there is at least no egregious error in implementation. More

importantly, this algorithm has been used before in [15], and they provide considerable

empirical validation of the approach using data obtained from physically measuring the

road. Thus, the only part of the algorithm whose accuracy is in serious question is the car

counting, and this is easily checked by counting the cars by hand and comparing that

actual result to the estimated result found by the algorithm. In brief, such a comparison

shows that the algorithm has excellent accuracy.

However, it is not sufficient validation to test the algorithm on a single camera in

that manner, and furthermore it is not really sufficient validation of the robustness of the

algorithm to test the algorithm on a camera whose images have been used to develop the

algorithm. To provide a valid test of robustness, I developed the algorithm while working

with the images of only one camera. Once the algorithm was performing well, I froze the

15

**code and then tested it on image sequences from several new cameras. However, I did
**

not choose the new cameras randomly – rather I chose only cameras that met the fairly

restrictive criteria outline in the Assumptions section. Unfortunately, only a small

percentage of Trafficland’s cameras actually meet those criteria; however I believe that

my algorithm will work more or less equally well on all that do. Most of the cameras are

disqualified either because the traffic is going in the wrong direction or because of some

form of severe distortion from headlight or streetlight glare.

The key results of the study are shown in Table 1. They consist of twenty-image

sequences from four cameras, and they compare the hand-counted results with the

automatically determined results. Of the four cameras, one was the base camera the

algorithm was developed on, and three were the new cameras in the test set. The results

show that the algorithm estimates are essentially nonbiased and quite accurate over a

fairly large range of traffic densities.

Camera

Type

Base

Manual

Base

Automatic

Camera 1

Manual

Camera 1

Automatic

Camera 2

Manual

Camera 2

Automatic

Camera 3

Manual

Camera 3

Automatic

Table 1. Empirical Testing Results.

Mean

4.25

4.30

0.40

0.45

7.30

7.15

4.5

4.7

S.D.

1.1

1.7

0.5

0.6

1.5

2.1

1.5

1.8

% Error

1.2

12.5

2.1

4.4

References

1. Dailey, D.J., Cathey, F.W., Pumrin, S., An algorithm to estimate mean traffic speed

using uncalibrated cameras, IEEE Trans. Intelligent Transportation Systems(1), No. 2,

June 2000, pp. 98-107.

2. S. Takaba, M. Sakauchi, T. Kaneko, B. Won-Hwang, and T. Sekine,

Measurement of traffic flow using real time processing of moving pictures,

in Proc. 32nd IEEE Vehicular Technology Conf., San Diego, CA, May 23–26, 1982, pp.

488–494.

3. N. Hashimoto, Y. Kumagai, K. Sakai, K. Sugimoto, Y. Ito, K. Sawai,

and K. Nishiyama, Development of an image-processing traffic flow measuring system,

Sumitomo Electric Tech. Rev., no. 25, pp. 133–137.

4. Traffic Flow Theory, edited by N.H. Gartner, C.J. Messer, and A.K. Rathi.

Washington, D.C.: US Federal Highway Administration. Chap. 2, Traffic Stream

Characteristics, by Hall, F.

16

**5. José Melo, Andrew Naftel, Alexandre Bernardino, José Santos-Victor: Viewpoint
**

Independent Detection of Vehicle Trajectories and Lane Geometry from Uncalibrated

Traffic Surveillance Cameras. International Conference of Image Analysis and

Recognition (ICIAR 2004). Porto, Outubro 2004.: 454-462.

6. Peek Traffic VideoTrak Detection System. Described in http://www.peektraffic.com/File.asp?FileID=ss96-081-1VideoTrak.

7. Worrall, A. D., Sullivan, G. D. and Baker, K. D. A simple, intuitive camera calibration

tool for natural images, Proc. 5th British Machine Vision Conference, 13-16 September,

University of York, York, 1994, pp 781-790.

8. A policy on geometric design of highways and streets (AASHTO Green Book)

American Association of State and Highway Transportation Officials . Jan. 2001 pp.

315-316.

9. K.W. Dickinson and R. C.Waterfall, “Video image processing for monitoring road

traffic,” in Proc. IEE Int. Conf. Road Traffic Data Collection, Dec. 5–7, 1984, pp. 105–

109.

10. R. Ashworth, D. G. Darkin, K.W. Dickinson, M. G. Hartley, C. L.Wan, and R. C.

Waterfall, “Applications of video image processing for traffic control systems,” in Proc.

2nd Int. Conf. Road Traffic Control, London, U.K., Apr. 14–18, 1985, pp. 119–122.

11. Bickel, P., Chen, C., Kwonx, J., Rice, J., van Zwety, E., Varaiyaz P. Measuring

traffic. (Preprint) June 2004, http://www.stat.berkeley.edu/users/rice/664.pdf

12. Drake, J.S., J.L. Schofer, and A.D. May. 1967. A statistical analysis of speed density

hypotheses. Highway Research Record 154, Highway Research Board, NRC,

Washington, D.C.: 53-87.

13. Cucchiara, R., Piccardi, M., Vehicle detection under day and night illumination. in

Proc. of IIA’99 - Third Int. ICSC Symp. on Intelligent Industrial Automation., Special

Session on Vision Based Intelligent Systems for Surveillance and Traffic Control, 1999,

pp. 789-794.

14. Zwahlen, H.T., and Schnell, T., Driver-headlamp dimensions, driver characteristics,

and vehicle and environmental factors in retroreflective target visibility calculations”,

Transportation Research Record 1692, National Academy of Sciences, Washington, DC.,

1999.

15. Masoud, O., Papanikolopoulos, N.P., Kwon, E., The use of computer vision in

monitoring weaving sections, IEEE Trans. Intelligent Transportation Systems, (2), No. 1,

March 2001, pp. 18-25.

17

- Elon Musk interview transcript - How to Build the Future
- Test ePUB comic
- Murder at the Speakeasy
- Alice's Adventures in Wonderland by Lewis Carroll
- Haruko HTML Jpeg 20120524
- adsfAlzheimer's Disease at Home (1)
- Rich text editor test
- Jared Local Test
- A personal account of recovering from RSI through the approach of John Sarno
- 39589412 VPR Vermont Poll Key Issues
- Max Hawkins
- Resume 9
- World Economic Forum Technology Pioneers 2010
- TechnologyPioneers2010 (1)
- Court Order for Dismissal of Scott v. Scribd
- Stipulation of Dismissal for Scott v. Scribd case
- Stambecco Preso Base 2010-04-12
- csc-sampis2
- Vintage Scribd.com homepage
- Very old Scribd browse page design
- Test 5
- 1 Improved Statistical Test
- 1 Improved Statistical Test
- Improved Statistical Test
- Test 6

Sign up to vote on this title

UsefulNot useful- 258 Finding Mean Traffic Speed in Low FrameRate Video
- SIMC 2014 Challenge Questions
- Nighttime Construction 1[1]
- GEOMETRIC DESIGN.doc
- [email protected]
- Real Time Path Planning
- Uml 1
- 2A_Fundamentals of Traffic Flow
- G1 Questions
- C for Engineers and Scientist
- Design of Intelligent Traffic Light Controller Using Embedded System
- Decoupling Markov Models From Information Retrieval Systems in Molecular Analysis Simulation
- 06054045
- IHCM Manual776 (Urban)
- scimakelatex.17598.sdfs
- Its
- MCQ-Test-Questions-on-Data-Structures-and-Algorithms-www.psexam.com_
- A Case for Suﬃx Trees
- UNIT III
- Chapter 1
- A Case for Superblocks
- Algoritmos de Localizacion IEEE
- Transportation Engg Sylabus
- On the Performance of the Constant Modulus Array Restricted to the Signal Subspace
- 1192_ftp
- Paper
- 09 Chapter 1
- Correct-basepaper-efficient Broadcasting in Mobile Ad Hoc Networks
- Josh Scannell What Can an Algorithm Do
- TCSP Final Draft Pros Guideline for Speed Measuring Equipment June 2007 - Interim
- Finding Mean Traffic Speed in Low Frame-Rate Video

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.