Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Non-Rigid Shape and Motion Recovery: Degenerate Deformations | CAP 6412, Papers of Computer Science

Material Type: Paper; Class: Advanced Computer Vision; Subject: Computer Applications; University: University of Central Florida; Term: Unknown 2005;

Typology: Papers

Pre 2010

Uploaded on 11/08/2009

koofers-user-buv
koofers-user-buv 🇺🇸

10 documents

1 / 8

Toggle sidebar

Related documents


Partial preview of the text

Download Non-Rigid Shape and Motion Recovery: Degenerate Deformations | CAP 6412 and more Papers Computer Science in PDF only on Docsity! Non-Rigid Shape and Motion Recovery: Degenerate Deformations Jing Xiao Takeo Kanade The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 {jxiao, tk}@cs.cmu.edu Abstract This paper studies the problem of 3D non-rigid shape and motion recovery from a monocular video sequence, under the degenerate deformations. The shape of a deformable object is regarded as a linear combination of certain shape bases. When the bases are non-degenerate, i.e. of full rank 3, a closed-form solution exists by enforcing linear con- straints on both the camera rotation and the shape bases [18]. In practice, degenerate deformations occur often, i.e. some bases are of rank 1 or 2. For example, cars moving or pedestrians walking independently on a straight road refer to rank-1 deformations of the scene. This paper quantita- tively shows that, when the shape is composed of only rank- 3 and rank-1 bases, i.e. the 3D points either are static or independently move along straight lines, the linear rotation and basis constraints are sufficient to achieve a unique solu- tion. When the shape bases contain rank-2 ones, imposing only the linear constraints results in an ambiguous solution space. In such cases, we propose an alternating linear ap- proach that imposes the positive semi-definite constraint to determine the desired solution in the solution space. The performance of the approach is evaluated quantitatively on synthetic data and qualitatively on real videos. 1. Introduction Recovery of 3D shape and motion from a monocular video sequence is an important task for applications like human computer interaction and robot navigation. The decades of work has led to significant successes on this problem. When the scene is static, reliable systems exist for 3D reconstruc- tion of the scene structure. In reality, many scenes are dy- namic and non-rigid: expressive faces, cars moving beside buildings, etc. Such scenes often deform with a class of basis structures. For example, the shape of a face can be regarded as a weighted sum of some shape bases, which correspond to various facial expressions [3]. Bregler and his colleagues [5] first introduced the basis representation to the problem of non-rigid structure from motion. Using this representation, in [18], we presented two sets of linear metric constraints, orthonormality constraints on camera rotations (rotation constraints) and uniqueness constraints on shape bases (basis constraints). We proved that, when the shape deformation is non-degenerate, i.e. all bases are of full rank 3, enforcing the linear constraints leads to a closed-form solution [18]. In practice, many scenes deform with degenerate bases of rank 1 or 2. Such bases limit the shape to deform only in a 2D plane. For instance, if a scene contains pedestrians walking indepen- dently along straight lines, the bases referring to those rank- 1 translations are degenerate. A simple illustration of rank- 3, 2, and 1 bases is shown in Figure 1. Under degenerate deformations, enforcing the linear metric constraints is not necessarily sufficient to determine a unique solution. This paper demonstrates that, when the shape involves rank-2 bases, the linear constraints leads to an ambiguous solution space that contains invalid solutions. The degree of freedom of the space is determined by the number of the rank-2 bases. Under such situations, we show that a valid solution in the space must be positive semi-definite. We then present an alternating linear optimization approach that combines the linear metric constraints and the positive semi-definite constraint to determine the desired solution. When the shape bases are of either rank 3 or rank 1, i.e. all the 3D points in the scene either are static or independently move along straight lines, the linear metric constraints pro- vide a unique solution to reconstructing the dynamic scene structure and camera motion. Note that such special degen- erate deformations often occur in real applications. For ex- ample, when several people walk independently along dif- ferent directions, each of the independent motions refers to a shape basis and all of them are of rank 1. Most of previous approaches [1, 7, 17] on degenerate deformations were pro- posed for this special case. However they require either the moving velocities are constant [7, 17] or the camera projec- tion matrices are given [1]. 2. Previous Work The problem of 3D shape and motion recovery from 2D image sequences has attracted a lot of attention. Various 0 0 0 0 0 0 0 0 0 Figure 1: (Left): Three points (red) simultaneously move along fixed directions in the 3D space. Their trajectories form a deformation basis of rank 3. (Middle): Two points move along fixed directions within a 2D plane. Their trajec- tories form a rank-2 shape basis. (Right): One point move along a fixed direction. Its trajectory forms a rank-1 basis. approaches have been proposed for different applications [12, 10, 18]. Our discussion will focus on the factorization methods that are closely related to our work. The factorization method was first proposed by Tomasi and Kanade [12]. First it applies the rank constraint to fac- torize a set of feature locations tracked across the entire se- quence. Then it uses the orthonormality constraints on the camera rotations to reconstruct the shape and motion in one step. This approach and its extensions to various camera projection models [9, 14] work for static scenes. Costeira and Kanade [6] proposed a method that factor- izes the image measurement to segment multiple indepen- dently moving objects and individually recover their shapes. Wolf and Shashua [16] derived a geometrical constraint, called the segmentation matrix, to reconstruct a scene con- taining two independently moving objects from two per- spective views. Vidal and his colleagues [15] generalized this approach to the case of multiple independently mov- ing objects. For reconstruction of scenes consisting of both static objects and objects moving along fixed directions, Han and Kanade [7] proposed a factorization method that achieves a unique solution assuming constant velocities. A more generalized solution to reconstructing the shapes that deform at constant velocity is presented in [17]. Bregler and his colleagues [5] first introduced the ba- sis representation of non-rigid shapes to embed the defor- mation constraints into the scene structure. By analyzing the low rank of the image measurements, they enforce the orthonormality constraints on camera rotations to factorize the non-rigid shape and motion. This method was extended to the nonlinear optimization approaches in [13, 4]. These three methods impose only the constraints on rotations. In [18], we proved that enforcing only the rotation constraints leads to ambiguous and invalid solutions. We then intro- duced the uniqueness constraints on the shape bases and proved that imposing both the basis and the rotation con- straints results in a linear closed-form solution, assuming the shape deformations are non-degenerate [18]. To recon- struct the degenerate deformations, most of previous ap- proaches [1, 7, 17] assume strong prior knowledge on either shape or motion. The methods in [7, 17] require that the de- formation velocity is constant. The method in [1] assumes that the trajectory of each 3D point is either a straight line or a conic and the camera projection matrices are all given. 3. Problem Statement Given 2D locations of P feature points across F frames, {(u, v)Tfp|f = 1, ..., F, p = 1, ..., P}, our goal is to re- cover the motion of the non-rigid object relative to the camera, including rotations {Rf |f = 1, ..., F} and trans- lations {tf |f = 1, ..., F}, and its 3D deforming shapes {(x, y, z)Tfp|f = 1, ..., F, p = 1, ..., P}, under the assump- tion of weak-perspective projection model. We follow the representation of [3, 5]. The non-rigid shape is represented as linear combination of K shape bases {Bi, i = 1, ...,K}. The bases are 3×P matrices controlling the deformation of P points. Then the 3D coordinate of the point p at the frame f is, Xfp = (x, y, z) T fp = Σ K i=1cfibip (1) where bip is the pth column of Bi and cif is its combination coefficient at the frame f . The image coordinate of Xfp under the weak perspective projection model is, xfp = (u, v) T fp = sf (Rf · Xfp + tf ) (2) where Rf stands for the first two rows of the fth camera rotation and tf = (tfxtfy)T is its translation relative to the world origin. sf is the nonzero scalar of the weak perspec- tive projection. Replacing Xfp using Eq. (1) and absorbing sf into cfi and tf , we have xfp = ( cf1Rf ... cfKRf ) · ( b1p ... bKp ) + tf (3) Suppose the image coordinates of all P feature points across F frames are obtained. We form a 2F × P measure- ment matrix W by stacking all image coordinates. Then W = MB + T (11...1). where M is a 2F × 3K scaled rotation matrix, B is a 3K × P bases matrix, and T is a 2F × 1 translation vector, M =   c11R1 ... c1KR1 ... ... ... cF1RF ... cFKRF   B =   b11 ... b1P ... ... ... bK1 ... bKP  , T =   t1 ... tF   (4) As in [7, 5], we position the world origin at the scene center and compute the translation vector by averaging the image projections of all points. We then subtract it from W bases are either rank-3 or rank-1 (K2 = 0), the metric con- straints generate a unique solution (ND = 0). Otherwise when there exist rank-2 bases (K2 > 0), the solution is am- biguous (ND > 0). 5. Solutions 5.1. Determine the Number of the Bases To utilize the rotation and basis constraints, we need to know the number of rank-3, 2, and 1 bases. First let us determine Kd, the rank of W̃ . We perform SVD on W̃ and obtain the singular values. In noiseless settings, Kd equals the number of the non-zero singular values. When noise ex- ists, Kd is estimated as the smallest number of the singular values whose sum is larger than some percentage (99% in our experiments) of the sum of all the singular values. We then decide K3, the number of non-degenerate bases. Because these bases are of rank 3, 1 ≤ K3 ≤ Kd/3. In previous section, we show that the basis constraints only determine the rank-3 bases, i.e. only rank-3 bases satisfy the basis constraints. Thus we choose K3 as the largest number from 1 to Kd/3 for which the linear constraints (Eq. (9,10,13∼16)) are satisfied. We now determine K2, the number of rank-2 bases. Ac- cording to Theorem 2, the rank of the linear constraints is a quadratic function of K2. Because K3 is known, we can compute the rank of Eq. (9,10,13∼16) and calculate K2 as a root of the function. Finally K1, the number of rank-1 bases, is Kd − 2K2 − 3K3. 5.2. An Alternating Linear Solution under the Existence of Rank-2 Shape Bases Due to Theorem 2, when rank-2 shape bases exist (K2 > 0), imposing the metric constraints (Eq. (9,10,13∼16)) leads to an ambiguous solution space. By definition Qi = g̃ig̃Ti is positive semi-definite. According to Eq. (17), if any of the skew-symmetric matrices (Ymn) in H is not zero, H is not positive semi-definite and nor is Qi that equals GHGT . Thus the solution space contains invalid solutions. Ymn’s have to be zeros so that Qi is a valid solution. We thus develop an alternating linear method that enforces this con- straint to uniquely determine a valid solution in the space. Because the linear solution space has the degree of free- dom of ND, we represent Qi as a weighted sum of a partic- ular solution and ND homogeneous solutions, Qi = Λ0 + Σ ND m=1λmΛm (19) where Λ0 is the particular solution and Λ1,...,ΛND are the homogeneous solutions. The scalars λm are the only un- knowns to solve for. Our algorithm consists of three steps: 1. Use the particular solution Λ0 as the initial estimate of Qi. 2. Apply SVD on Qi to compute its best possible rank 3 ap- proximation g̃i g̃i T . 3. Given g̃i , calculate the coefficients λm in Eq. (19) by the linear least-square method. Then update Qi via Eq. (19). The last two linear processes are repeated alternatively till they converge. Note that the positive semi-definite con- straint Qi = g̃i g̃i T is explicitly enforced. Once g̃ i , i = 1, ...,K3 are determined, according to Eq. (6), we recon- struct the rotations and the associated coefficients. So far we have recovered the columns of G that refer to the non-degenerate bases and the camera rotations. We now recover the other columns, g3K3+1 , ..., gKd , which cor- respond to the degenerate bases. From the second equation in Eq. (6), we cancel the unknown coefficients and achieve F constraints on g j and rj , (M̃2m−1gj Rm,2 − M̃2mgj Rm,1)rj = 0, m = 1, ..., F (20) where Rm,1 means the first row of the rotation matrix Rm. Due to Eq. (12), we obtain another 2K3 constraints on gj , M̃mgj = 0, m = 1, ..., 2K3 (21) We then apply the following alternating linear approach to determine gj and rj , 1. Calculate a particular solution of Eq. (21) as the initial esti- mate of gj . 2. Given gj , calculate the rank-1 null space of Eq. (20) as the solution of rj . 3. Given rj , solve Eq. (20) and (21) to update gj . The last two linear processes are repeated alternatively till they converge. In these processes, we constrain G to be non-singular by forcing its columns independent on each other. This way prevents the algorithm from converging to some trivial solutions, e.g. g j and r j are both zeros. Now we have completely recovered the corrective transformation G. The associated coefficients and the shape bases are com- puted using Eq. (6) and (5) respectively. Their composition then reconstructs the non-rigid shapes as in Eq. (1). 5.3. A Unique Solution when Rank-2 Shape Bases do not Exist A special case of degenerate deformations, i.e. all the points on the non-rigid shape either are static or indepen- dently move along straight lines, often occurs in practice. For example, cars drive or pedestrians walk independently along straight lines and beside a house. Several approaches [1, 7, 17] have been developed specifically for such degener- ate deformations. However they require strong prior knowl- edge on either shape or motion. For example, assuming the camera projection matrices and the feature correspondence are given across five or more views, [1] presents the trajec- tory triangulation technique that uniquely reconstructs the 3D shape and motion trajectories. In such cases, the shape bases are either rank-3 or rank- 1. In the above example, the bases referring to the inde- pendent motions of cars or pedestrians are of rank 1 and the basis corresponding to the static house is of rank 3. Because K2 = 0, according to Theorem 2, enforcing the linear met- ric constraints (Eq. (9,10,13∼16)) leads to a unique solution of Qi. Using SVD, we can factorize Qi to compute g̃i . Then the camera rotations can be recovered using Eq. (6). Under the weak-perspective projection model, given the recovered rotations, we can construct the projection matrix up to a scalar, Ωi = ( Ri 0 0 0 0 1 0 ) (22) where the translation has been eliminated by moving the ori- gin to the center of all points. We then apply the trajectory triangulation technique [1, 11] to uniquely reconstruct the 3D shapes and motion trajectories. For details of the trajec- tory triangulation technique, refer to [1, 11]. Note that we do not require the assumptions as previous approaches did. 6. Performance Evaluation The performance of our approach is evaluated in a number of experiments. First, we evaluate its robustness and accu- racy quantitatively on synthetic data. Second, we apply it on real image sequences to examine it qualitatively. 6.1. Quantitative Evaluation on Synthetic Data Our approach is first quantitatively evaluated on the syn- thetic data. We test its accuracy and robustness on two factors: number of degenerate bases and strength of noise. Since the number of the unknowns involved in the alter- nating linear algorithm only depends on the number of the rank-2 bases, we choose all the degenerate bases to be of rank 2 in the experiments. Thus more degenerate bases re- sult in a more complex optimization process. Assuming a Gaussian white noise, we represent the noise strength level by the ratio between the Frobenius norm of the noise and the measurement, i.e. ‖noise‖‖W̃‖ . In general, when noise ex- ists, the larger the number of degenerate bases is, the more complicated the optimization process is and thus the worse its performance is. Figure 3 shows the evaluation on a 10 bases setting. The number of degenerate bases is respectively 1,..., or 9, shown as the horizontal axes. Four levels of Gaussian white noise are imposed. Their strength levels are 0%, 5%, 10%, and 20% respectively. We test a number of trials on each setting. The average reconstruction errors on the rotations and 3D shapes relative to the ground truth are shown in Figure 3. In the experiments when the noise level is 0%, regardless of 0 1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 10 12 14 16 18 20 22 24 Number of Degenerate bases out of 10 shape bases R el at iv e re co n st ru ct io n e rr o rs o n r o ta ti o n s (% ) ||noise|| = 0%*||W|| ||noise|| = 5%*||W|| ||noise|| = 10%*||W|| ||noise|| = 20%*||W|| 0 1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 10 12 14 16 18 20 22 24 Number of Degenerate bases out of 10 shape bases R el at iv e re co n st ru ct io n e rr o rs o n s h ap es ( % ) ||noise|| = 0%*||W|| ||noise|| = 5%*||W|| ||noise|| = 10%*||W|| ||noise|| = 20%*||W|| Figure 3: The relative reconstruction errors under differ- ent levels of noise and various number of degenerate bases. Each curve refers to a respective noise level. how many bases are degenerate, our method converges to the exact rotations and shapes with zero error. When there is noise, it achieves reasonable accuracy, e.g. the maximum reconstruction error is less than 20% when the noise level is 20% and 9 out of 10 bases are degenerate. As we expected, under the same noise level, the performance is better when more bases are non-degenerate. 6.2. Qualitative Evaluation on Real Video Se- quences We then examine our approach qualitatively on a number of real video sequences. One example is shown in Figure 4. The sequence was taken of an indoor scene by a handhold camera. The dynamic scene consisted of a static table and two boxes moving on top of the table. The boxes moved in- dependently along the straight borders on the table top and at varying velocities. The scene structure is thus composed of three shape bases, one representing the static table and the initial locations of the two boxes and the other two rep- resenting the two linear motion vectors respectively. Since the boxes vertices and the table corners are not located in the same plane, the first shape basis is of rank 3. The other two bases are both of rank 1. Thus the rank of the image measurement W̃ is 5. 18 feature points, consisting of the table corners and vis- ible vertices of the boxes, across 30 images are given for reconstruction. Two of them are shown in Figure 4.(a,b). The numbers of the three types of bases are determined as described in Section 5.1. The camera rotations and dynamic scene structure are then reconstructed by the alternating lin- ear algorithm. To evaluate the reconstruction, we synthe- size the scene appearance viewed from one side, as shown in Figure 4.(c,d). The wireframes show the structure and the yellow lines show the trajectories of the moving boxes from the beginning of the sequence until the present frames. The recovered structure is consistent with our observation, e.g. the boxes approximately move along the table top bor- ders. Figure 4.(e,f) show the reconstructed scene viewed from the top. Because the scene structure is composed of rank-1 and rank-3 bases, we also tested the unique solu- tion described in Section 5.2 on this setting and achieved the similar results. Occlusion was not taken into account when rendering these images. So in the regions that should be occluded, e.g. the areas behind the boxes, the stretched texture of the occluding objects appears. Our approach as- sumes the weak-perspective projection model that requires the scene to be far from the camera. However in this ex- periment, the images were not taken from a long distance. Due to the perspective effect, the recovered object shapes are somewhat distorted, e.g. the shapes of the boxes are not precisely cuboid. Human faces are highly non-rigid objects and 3D face shapes can be represented as linear combinations of certain shape bases that refer to various facial expressions. Un- der some facial motions, e.g. eye opening, the deforma- tions along horizontal and vertical directions are dominant and those along depth direction are relatively subtle. Under the expressions where these degenerate motions are accom- panied with other facial deformations such that the corre- sponding bases for the entire face shape are non-degenerate, the non-rigid shapes can be recovered using the method in [18]. Under some expressions, e.g. yawning and blinking, the facial deformations are mainly composed of these de- generate motions and thus the corresponding bases are close to degenerate. In such cases, we have to utilize the alter- nating linear method. One example is shown in Figure 5. The sequence consists of 180 face images that contain ex- pressions like blinking and smiling. 68 feature points were tracked using an efficient Active Appearance Model (AAM) method [2]. Figure 5.(a,b) display two input images with marked features. Their corresponding shapes are recon- structed and shown from a novel view in Figure 5.(c,d). The (a) (b) (c) (d) (e) (f) Figure 4: Reconstruction of two boxes independently mov- ing along the borders of a static table top. (a)&(b): Two in- put images with marked features. (c)&(d): Reconstructed scene appearance viewed from one side. The wireframes show the structure and the yellow lines show the trajecto- ries of the boxes from the beginning of the sequence until the present frames. (e)&(f): Reconstructed scene appear- ance viewed from the top. overlapped wireframes demonstrate the recovered facial de- formations such as mouth widening when smiling and eye closure when blinking. 7. Conclusion and Discussion This paper studies the problem of non-rigid structure from motion under degenerate deformations. We quantitatively demonstrate that when rank-2 bases exist, imposing only the linear metric constraints (Eq. (9,10,13∼16)) results in an ambiguous solution space. To eliminate the ambiguity, we develop an alternating linear approach that combines the metric constraints with the positive semi-definite constraint. When the points on the shape either are static or indepen- dently move along straight lines, we present a unique solu- tion to reconstructing the 3D shape and motion trajectories.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved