Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Geometric Transformations: Planes, Points, Lines, Projection, and Intersections - Prof. Da, Study notes of Computer Science

An in-depth exploration of the representation of geometric objects, including planes, points, and lines, as well as perspective projection and intersections. It covers the mathematical concepts behind these topics and offers methods for finding intersections and lines or planes that include multiple objects. Essential for students studying computer graphics, engineering, or mathematics.

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-9bh
koofers-user-9bh 🇺🇸

10 documents

1 / 13

Toggle sidebar

Related documents


Partial preview of the text

Download Geometric Transformations: Planes, Points, Lines, Projection, and Intersections - Prof. Da and more Study notes Computer Science in PDF only on Docsity! Class Notes CMSC 426 3D Geometry and Projection Introduction One of the main goals of computer vision is to use 2D images to determine the structure and position of 3D objects in the world. To do this, we have to discuss geometry. We need to understand how to represent geometric objects in 3D and 2D, and we need to understand the relationship between the 3D world and a 2D image. This note will cover three topics: representation of planes, points and lines; perspective projection that relates the 3D and 2D positions of these objects; and intersections of these objects as well as the process of finding lines or planes that include multiple lines and points. Representation In this class we will only consider the simplest geometric objects: points, lines and planes. This is the minimal set of objects that we can consider. Points are the simplest of objects, we must understand lines because light goes in a straight line, and we must understand planes, because the image is a plane. However, these simple objects are also very important, and much work in vision is done using just this set of objects. Points: First, we recall that we represent a point by its coordinates in space. If a point is in 2D, we describe it with x and y coordinates. We will describe points using lower case letters, so we might write p=(x,y). If a point is in 3D we also need a z coordinate, and we use upper case letters, so we could write P=(x,y,z). 2D Lines: Lines already become a little more complicated. There are several ways of representing lines, each with their own advantages and disadvantages. In 2D we can represent a line with a single, linear equation, of the form: ax+by+c=0. Here, a, b, and c, are constant values that determine the line. A point (x0,y0) is on this line if the equation is satisfied when we plug in these values for x and y. We can get a useful intuition by rewriting this equation in the following way: ( ) ( ) ( ) ( ) ( )ba c c ba b b ba a awhere cbayx ba c ba ba yxcbayx , ' , ' , ' ')','(),( ,, ),( ),(.),(),( −=== =•−=•−=• That is, we can create a unit vector, (a’,b’). Then a point is on a line if the inner product between the point and (a’,b’) is equal to some constant value, c’. Keeping in mind that the inner product between (x,y) and (a’,b’) measures the distance from (x,y) to the origin in the direction of (a’,b’), we have the following picture. When we describe a line like this, the vector (a’,b’) will be orthogonal to the line. Also, it is also useful to recall another way of describing a line, with the equation: y = mx+b. When we describe a line like this, m, is the slope of the line. There is still one more way of describing a line. We can write down a recipe for reaching any point on a line by starting at one point of the line, say (x0,y0), and then moving some distance t, in the direction of the line. If we describe the direction of the line with a unit vector, (u,v), then we can write this down by saying that (x,y) is on the line if it satisfies the equation: ( ) ( ) ( )vutyxyx ,,, 00 += Notice that this is really two equations, one for the x component of the point, and one for the y component, and these equations are linear in three unknowns, x, y, and t. Previously, we had described a line with one linear equation and two unknowns. We can convert between these two representations by using one equation to eliminate t; then we would obtain one linear equation in two unknowns. Planes: Before, we consider how to represent a line in 3D, it’s useful to look at a 2D plane in 3D. First, let’s take an example. Suppose we want to represent the floor of a room. We can say that these are all the points that have a height of zero. If we use the y coordinate to represent height, we can represent this plane with the equation: y=0. Note that this is a linear equation in x, y, and z, although x and z don’t happen to actually show up in the equation, since they points on the floor can have any values for x and z. More generally, we can describe any plane with a linear equation of the form: Ax+By+Cz+D=0. c’ (a’,b’) If we want to form a plane from a point and a line, one way to do this is to just pick two points from the line, and then use the above method. For those of you who are familiar with the cross-product, we note that given points P1, P2, and P3, we can find a vector normal to the plane of the three points by taking (P2 - P1)x(P3 - P1), where x is the cross-product operation. If we express a plane with the equation: AX+BY+CZ+D = 0 this gives us (A,B,C). We can use the coordinates of any of the points to solve for D. We won’t go into this method in detail, though, because we will try to stick with problems in which finding the plane formed by points, or a point and a line, is easy. Perspective Projection We now have the tools that we need to begin to solve some vision problems. We begin by describing the process of perspective projection. The key question that we must address is, given a description of the camera position and the location of a 3D point, where will this point appear in the image? With perspective projection, we describe a camera using a focal point and an image plane. We imagine that light travels in a straight line from a scene point towards the focal point. The location where the light ray intersects the image plane is the image location for this scene point. In a pinhole camera, the focal point is the pinhole, and the light passes through it on the way to the image plane, which might be a CCD, or film. In our idealization of a pinhole camera, the image plane is in front of the pinhole, so the light strikes the image plane before it reaches the focal point. Either way, we can find the image point by forming a line that includes the scene point and the focal point, and finding where it intersects the image plane. We have explained above how to perform these operations, so this tells us how to find the image point corresponding to a scene point, for a general camera position. As an example, suppose we have a camera with a focal point at (1,2,3), with an image plane at the x=2 plane, and we wish to find the image produced by a point in the scene at the location (9,6,5). We can describe the line that joins the scene point to the focal point with the equation: (x,y,z) = (1,2,3) + t(8,4,2). x = 1 + 8t, and we want find the point where this intersects the x=2 plane, (ie., the point on the line that has x = 2). This occurs when 2 = 1 + 8t, or when t = 1/8. The point on the line for t = 1/8 is (2, 2.5, 3.25). We will often consider a special camera set-up (see figure above) which makes it much easier to compute the image points produced by scene points. This is the case in which the focal point is at the origin, and the image plane is the z=f plane, where f is called the focal length of the camera. Suppose now we have a scene point P, with coordinates (X,Y,Z), which produces an image point, p. p has coordinates (x,y), or alternately we can think of it as a 3D point on the image plane, with coordinates (x,y,f). We notice that there are two similar triangles, one with corners at (0,0,0), (X,Y,Z), and (0,0,Z) (this is the triangle above that is light gray, including also the dark gray triangle and its tip) and the other with corners at (0,0,0), (x,y,f), and (0,0,f) (the dark gray triangle). The first of these triangles is the same as the second one, but scaled by a factor of Z/f. This means that the side of the big triangle with corners (0,0,0) and (X,Y,Z) is the same as the side with corners at (0,0,0) and (x,y,f), but scaled by a factor of Z/f. This tells us that (Z/f)(x,y) = (X,Y), which implies that (x,y)=f(X/Z, Y/Z). This is the basic equation of perspective projection. The vanishing point and the horizon We can now derive some basic facts about perspective projection. The first is that any point in front of the camera will project to a point in the image plane. The scene point and the focal point form a line that will intersect the image plane in a single point. When we say that a point is in front of the camera, we mean that the image plane separates the scene point and the focal point, so that the line segment connecting the two points will intersect the image plane. z=1, the image plane p=(x,y,1) (0,0,0) Focal Point (0,0,Z) P=(X,Y,Z) If a line is in front of the camera, it will generally project to a line in the image plane. Every scene point on the line will project into the image along a line connecting it to the focal point. Collectively, all these projection lines connect the scene line to the focal point, forming the plane that includes the scene line and the focal point. In general, this plane will intersect the image plane in a line. There are two special cases to consider, though. We’ll discuss one below. Here we mention the case in which the scene line includes the focal point. In this case, all lines of projection that connect a point on the scene line to the focal point are, in fact, identical to the scene line. Since all of these lines are identical, they all intersect the image plane in the same point. One can visualize this by imagining one is looking at a line end-on, so that it looks like just a point. In general, a scene plane can fill the entire image. Given any image point, this point forms a line with the focal point, which will intersect the scene plane. This point on the scene plane, then, will project to the image at this image point. An interesting special case occurs, though, if the scene plane is orthogonal to the image plane. An important example of this is when the scene plane is the ground, and the camera is pointing in a horizontal direction. We can describe such a situation with a camera that has a focal point at (0,0,0) and an image plane of z=1, and with a ground plane described by y=-k. In this case, the y direction is down, and k is the height of the camera’s focal point above the ground. Now, let’s consider the projection of a point on the ground. A point on the ground has coordinates (x, -k, z), for any arbitrary values of x and z. If the point is in front of the camera, then z > 1. Using the equation of projection, the image of this point will be (x/z, -k/z). This could be any location in the image, except that –k/z is always negative, so this point must always be in the bottom half of the image. The image of the plane occupies all points with negative y coordinates, up to the line y=0, which is called the horizon. Of course this accords with our experience that when we look at the world in a direction parallel to the ground, the ground is always in the bottom half of the image. It does not fill up the whole image. More generally, similar reasoning shows that when we look at any plane that is orthogonal to the image plane, the points in that plane will fill up half of the image. There is one last special case, that occurs when a scene plane is situated so that the focal point lies in the same plane. In this case, every line that connects a point in the scene plane to the focal point lies in this plane. This plane intersects the image plane in a line, so the images of all points in the scene plane lie along a line in the image. This is what happens, for example, when you look at a sheet of paper end-on, so that it looks like a thin line. Now let’s consider what happens when a scene line is orthogonal to the image plane, for example, a line that lies on the ground plane. We can describe a line on the ground plane with the equation: ),0,(),,(),,( 000 wutzyxzyx += the intersection of Q and the first image plane. Since p1 is on Q and it’s on the first image plane, p1 is on the intersection of these, L1. Similarly, p2 must lie on L2. This already gives us some very useful information about the relationship between image points of the same scene point. First, suppose we know the camera geometry that created two images. This situation is called stereo. Now, suppose we see a point, P, in the first image, giving us p1, but we do not know the location of P. We can use f1, f2, and p1 to determine Q. We can then use Q and the location of the second image plane to determine the line L2. Now we know that the image of P in the second image, p2, must lie on this line, L2. So seeing a point in one image is enough to narrow down its location in the second image to a single line. We can go a bit further. If we observe an image point anywhere on the line L1, we get exactly the same epipolar plane, Q. This means that any point in the first image, that lies on L1, must match a point on the second image that lies on L2. And the reverse is true by similar reasoning. So L1 and L2 give us two sets of points that can only be matched to each other. When we know the camera geometry, the problem of matching points in 2D images can be reduced to the problem of matching points along 1D lines. The Epipole: The epipolar plane depends on the scene point, P, and different scene points give rise to different epipolar planes and lines. However, the epipolar lines all have something in common; in general, they all intersect the same image point. This point is called the epipole. To see this, consider the line that goes through f1and f2. Call this line F. Since f1 and f2 are points on any epipolar plane, no matter which scene point produces this epipolar plane, this means that F is on always on the epipolar plane. We will call the point where F intersects the second image plane, e2. This is the epipole in image 2. For any scene point, e2 is on the epipolar plane for that scene point, and on the image plane, so it is always on the epipolar line. This means that all epipolar lines intersect at e2. There is one important special case where the above statements are not true. It is always possible that F does not intersect the second image plane, because they are parallel. This means both F and any epipolar line will lie in the same epipolar plane, but they will not intersect. This can only happen if F is parallel to any epipolar line. If all the epipolar lines are parallel to F, then they must all be parallel to each other. So, either the epipolar lines all intersect at the epipole, or they are all parallel to each other. Standard Stereo: So far we have discussed the general case of two images taken with any camera positions. However, it will often be convenient to consider the case of two cameras in a particularly simple position, analogous to the position of our two eyes. We suppose that the two cameras are side by side, with focal points at z=0, and y = 0, separated only in the x direction. For example, the focal points might be at the locations (0,0,0) and (T,0,0). In addition to that, we assume that the cameras are pointing in the z direction with the same focal length, so we can assume that they both have the same image plane, z = 1. As explained above, all the epipolar lines will be parallel to the line that joins the focal points, which is the x axis. This means that all the epipolar lines are horizontal lines in the two images. If a scene point appears at the location (x1,y1,1) in the first image, it will appear along the line y = y1 in the second image. Now, let’s suppose we see the scene point, P, in both images. Suppose that it appears in the first image at the location p1 = (x1, y1, 1). Then the epipolar constraint tells us that it will appear in the second image at some location p2=(x2, y1, 1), ie., with the same y coordinate, but a different x coordinate. Then we can notice that we have two similar triangles, one involving f1, f2, P, and the other with p1, p2, P. The base of the first triangle has a width of T, while the second triangle has a width of x2-x1. We will define the disparity to be the difference in x coordinates caused by depth, that is, it is d = (T-x2) – (0-x1) = T + (x1-x2). Now, the depth of the point, P, that is, its z coordinate, is given by the equation: (z-1)/z = (x2-x1)/T Tz – T = z(x2-x1) z(T + x1 – x2) = T F L2 L1 e2 e1 p2 Focal point f1 p1 Image Plane I2 Image Plane I1 P Epipolar Plane Q Focal point p2 z = T/d That is, when we see a point in two images, its depth is inversely proportional to its disparity, and proportional to the distance between the two camera centers. So, if a point has disparity of zero, its depth is infinite. That is, points that are infinitely far away will appear in the same position in both images. Rectification What if our cameras are not in this nice configuration? It turns out that we can rectify the images to produce the images that we would have gotten if our cameras were arranged like this. First, we note that we can just define our coordinate system so that the focal point of the first camera is the origin, and so that the line connecting the focal points is the x axis, and the second focal point is located at some position (T,0,0). The only thing we have to worry about is the possibility that the image planes are not the z=1. However, if we have an image taken with a particular focal point, f, and image plane, we can generate a new image that shows the world as seen by a camera with the same focal point, but a different image plane. Suppose our camera has an image plane I, and we want to generate an image with an image plane J. For any point, p, in J, we can find the line L that goes through p and f. We intersect L with the original image plane, I, obtaining the point q. The point in the world, P, that created the intensity at q lies on the line L, which connects q to the focal point. If we had taken a picture using f and J as our focal point and image plane, the line connecting P and f would still be L, and it would intersect J at the point p. So the same world point that generated the intensity at q would also create the intensity at p. So we can create the image at J by transferring the intensity from q to p. z (x1,y,1) (x2,y,1) T
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved