《外文文献—虚拟现实.doc》由会员分享,可在线阅读,更多相关《外文文献—虚拟现实.doc(8页珍藏版)》请在三一办公上搜索。
1、VIRTUAL REALITYJae-Jin Kim Chapter 4Virtual Reality to Simulate Visual Tasks forRobotic Systems4.1IntroductionVirtual reality (VR) can be used as a tool to analyze the interactions between the visual system of a robotic agent and the environment, with the aim of designing the algorithms to solve the
2、 visual tasks necessary to properly behave into the 3D world. The novelty of our approach lies in the use of the VR as a tool to simulate the behavior of vision systems. The visual system of a robot (e.g., an autonomous vehicle, an active vision system, or a driving assistance system) and its interp
3、lay with the environment can be modeled through the geometrical relationships between the virtual stereo cameras and the virtual 3D world. Differently from conventional applications, where VR is used for the perceptual rendering of the visual information to a human observer, in the proposed approach
4、, a virtual world is rendered to simulate the actual projections on the cameras of a robotic system. In this way, machine vision algorithms can be quantitatively validated by using the ground truth data provided by the knowledge of both the structure of the environment and the vision system.In compu
5、ter vision (Trucco & Verri, 1998; Forsyth & Ponce, 2002), in particular for motion analysis and depth reconstruction, it is important to quantitatively assess the progress in the field, but too often the researchers reported only qualitative results on the performance of their algorithms due to the
6、lack of calibrated image database. To overcome this problem, recent works in the literature describe test beds for a quantitative evaluation of the vision algorithms by providing both sequences of images and ground truth disparity and optic owmaps (Scharstein & Szeliski, 2002; Baker et al., 2007). A
7、 different approach is to generate image sequences and stereo pairs by using a database of range images collected by a laser range-nder (Yang & Purves, 2003; Liu et al., 2008).In general, the major drawback of the calibrated data sets is the lack of interactivity: it is not possible to change the sc
8、ene and the camera point of view. In order to face the limits of these approaches, several authors proposed robot simulators equipped with visual sensors and capable to act in virtual environments. Nevertheless, such software tools are capable of accurately simulating the physics of robots, rather t
9、han their visual systems. In many works, the stereo vision is intended for future developments (Jrgensen & Petersen, 2008; Awaad et al., 2008), whereas other robot simulators in the literature have a binocular vision system (Okada et al., 2002; Ulusoy et al., 2004), but they work on stereo image pai
10、rs where parallel axis cameras are used. More recently, a commercial application (Michel, 2004) and an open source project for cognitive robotics research (Tikhanoff et al., 2008) have been developed both capable to xate a target, nevertheless the ground truth data are not provided.4.2 The visual sy
11、stem simulatorFigure 4.1a-b shows the real-world images gathered by a binocular robotic head, for different stereo congurations: the visual axes of the cameras are kept parallel (Figure 4.1a) and convergent for xating an object in the scene (the small tin, see Figure 4.1b). It is worth noting that b
12、oth horizontal and vertical disparities have quite large values in the periphery, while disparities are zero in the xation point. Analogously, if we look at the motion field generated by an agent moving in the environment (see Figure 4.1c), where both still and moving objects are present the resulti
13、ng optic ow is composed both by ego-motion components, due to motion of the observer, and by the independent movements of the objects in the scene.(a) (b) (c)Figure 4.1 Binocular snapshots obtained by real-world vision systems. (a)-(b): The stereo image pairs are acquired by a binocular active visio
14、n system (http:/www.searise.eu/) for different stereo congurations: the visual axes of the cameras are (a) kept parallel, (b) convergent for xating an object in the scene (the small tin). The anaglyphs are obtained with the left image on the red channel and the right image on the green and blue chan
15、nels. The interocular distance is 30 cm and the camera resolution is 1392 1236 pixels with a focal length of 7.3 mm. The distance between the cameras and the objects is between 4 m and 6 m. It is worth noting that both horizontal and vertical disparities are present. (c): Optic ow superimposed on a
16、snapshot of the relative image sequence, obtained by a car, equipped with a pair of stereo cameras with parallel visual axes , moving in a complex real environment. The resolution of the cameras is 1392 1040 pixels with a focal length of 6.5mm, and the baseline is 33 cm (http:/pspc.dibe.unige.it/dri
17、vsco/). Different situations are represented: ego-motion (due to the motion of the car) and a translating independent movement of a pedestrian (only the left frame is shown).The aim of the work described in this chapter is to simulate the active vision system of a robot acting and moving in an envir
18、onment rather than the mechanical movements of the robot itself. In particular, we aim to precisely simulate the movements (e.g. vergence and version) of the two cameras and of the robot in order to provide the binocular views and the related ground truth data (horizontal and vertical disparities an
19、d binocular motion filed ). Thus, our VR tool can be used for two different purposes (see Figure 4.2):1. to obtain binocular image sequences with related ground truth, to quantitatively assess the performances of computer vision algorithms;2. to simulate the closed loop interaction between visual pe
20、rception and action of the robot.The binocular image sequences provided by the VR engine could be processed by computer vision algorithms in order to obtain the visual features necessary to the control strategy of the robot movements. These control signals act as an input to the VR engine, thus simu
21、lating the robot movements in the virtual environment, then the updated binocular views are obtained. In the following, a detailed description of the model of a robotic visual system is presented.Figure 4.2 The proposed active vision system simulator. Mutual interactions between a robot and the envi
22、ronment can be emulated to validate the visual processing modules in a closed perception-action loop and to obtain calibrated ground truth data.4.2.1 Tridimensional environmentThe 3D scene is described by using the VRML format. Together with its successor X3D, VRML has been accepted as an internatio
23、nal standard for specifying vertices and edges for 3D polygons, along with the surface color, UV mapped textures, shininess and transparency. Though a large number of VRML models are available, e.g. on the web, they usually have not photorealistic textures and they are often characterized by simple
24、3D structures. To overcome this problem, a dataset of 3D scenes, acquired in controlled but cluttered laboratory conditions, has been created by using a scanner laser. The results presented in Section 6 are obtained by using the dataset obtained in our laboratory.It is worth noting that the complex
25、3D VRML models can be easily replaced by simple geometric gures (cubes, cones, planes) with or without textures at any time, in order to use the simulator as an agile testing platform for the development of complex computer vision algorithms.4.2.2 RenderingThe scene is rendered in an on-screen OpenG
26、L context (see Section 5 for details). Moreover, the SoOffScreenRenderer class is used for rendering scenes in off-screen buffers and to save to disk the sequence of stereo pairs. The renderer can produce stereo images of different resolution and acquired by cameras with different filed of views. In
27、 particular, one can set the following parameters :(1) resolution of the cameras (the maximum possible resolution depends on the resolution of the textures and on the number of points of the 3D model);(2) horizontal and vertical eld of view (HFOV and VFOV, respectively);(3) distance from camera posi
28、tion to the near clipping plane in the cameras view volume, also referred to as a viewing frustum, (nearDistance);(4) distance from camera position to the far clipping plane in the cameras view volume (farDistance);(5) distance from camera position to the point of focus (focalDistance).4.2.3 Binocul
29、ar head and eye movementsThe visual system, presented in this Section, is able to generate the sequence of stereo image pairs of a binocular head moving in the 3D space and xating a 3D point (XF, YF, ZF).The geometry of the system and the parameters that can be set are shown in Figure 4.3.Figure 4.3
30、 Schematic representation of the geometry of the binocular active vision system.The head is characterized by the following parameters (each expressed with respect to the world reference frame (XW, YW, ZW) :(1) cyclopic position C =(XC, YC, ZC);(2) nose orientation;(3) xation point F =(XF, YF ,ZF ).O
31、nce the initial position of the head is xed, then different behaviours are possible:(1) to move the eyes by keeping the head (position and orientation) xed;(2) to change the orientation of the head, thus mimicking the movements of the neck;(3) to change both the orientation and the position of the h
32、ead, thus generating more complex motion patterns.These situations imply the study of different perceptual problems, from scene exploration to navigation with ego-motion. Thus, in the following (see Section 6), we will present the results obtained in different situations. For the sake of clarity and
33、 simplicity, in the following we will consider the position C = (XC,YC,ZC) and the orientation of the head xed, thus only the ocular movements will be considered. In Section 3.3.1 different stereo systems will be described (e.g. pan-tilt, tilt-pan, etc.), the simulator can switch through all these d
34、ifferent behaviours. The results presented in the following consider a situation in which the eyes can rotate around an arbitrary axis, chosen in order to obtain the minimum rotation to make the ocular axis rotate from the initial position to the target position (see Section 3.3.1).第二部分中文译文虚拟现实Jae-J
35、in Kim 第四章虚拟现实机器人的模拟视觉任务4.1引言虚拟现实(VR)可以作为一种工具,用来分析机器人代理的视觉系统和环境之间的相互作用,意在设计算法来解决3D世界中必要的正确行为的视觉任务。我们的方法的新颖性在于把虚拟现实作为一种工具来模拟视觉系统的行为。机器人的视觉系统(例如,自动汽车、主动视觉系统、驾驶辅助系统)和环境之间的相互作用可以通过虚拟的立体摄像机和虚拟的3D世界模拟。不同于传统的应用程序,虚拟现实被用于人类观察可感知视觉信息的渲染,一个合理的方法是,将虚拟世界呈现在机器人摄像机上模拟真实的投影上。通过这种方法,机器的视觉算法可以通过地面上真实环境结构和视觉系统提供的数据来量
36、化验证。在计算机视觉(Trucco和Verri,1998年;Forsyth与Ponce,2002年),特别是在运动分析和深度重建,在该领域里定量评估的进展是很重要的,但通常由于缺乏校正图像数据库,研究人员只能报告算法性能的定性结果。为了克服这个问题,最近的文学作品中描述了通过提供地面真实的差距与视神经流程图这两个序列图像,对视觉算法进行定量评价的测试床(Scharstein和Szeliski,2002年,Baker等,2007年)。另一种方法是使用激光测距仪收集范围图像数据库,生成图像序列和立体对(Yang和Purves,2003年;Liu等,2008)。一般情况下,校准数据集的主要缺点是缺乏
37、互动性:改变现场和摄像点的角度是不可能的。为了打破这些方法的限制,一些作者提出了配备视觉传感器、能在虚拟环境中行事的机器人模拟器。然而,相比他们的视觉系统,这样的软件工具能够准确的模拟出机器人的物理现象。在许多作品中,立体视觉更侧重于未来的发展(Jorgensen和Petersen,2008;Awaad等,2008),而文献中的其它机器人模拟器有一个双目视觉系统(Okada等,2002;Ulusoy等,2004),但他们正为使用平行轴相机的立体图像对努力。最近,商业应用(Michel,2004)和认知机器人研究的开源项目被开发,它们都能注视一个目标,却无法提供地面真实数据。4.2 视觉系统模拟
38、器图4.1a - b显示了双目机器人头部收集的现实世界的图像不同的立体基阵:摄像机的视轴保持平行(图4.1a)将注视场景中的物体汇聚起来(图4.1b所示的小罐头)。值得注意的是,横向和纵向的差距在外围上有很大的值,然而在固定点上,差距是零。类似的,如果我们看一下代理人在环境中移动形成的运动区域(参见图4.1c的内容),然而由于观察员的意向和场景中对象的独立运动,静止和移动的对象都呈现着运动部分所产生的光流。图4.1中的(a)-(b)是现实世界的视觉系统获取的双目快照,双目主动视觉系统(http:/www.searise.eu/)的立体图像对被获取用于不同的立体基阵:视轴相机(a)保持平衡,(b
39、)把定位的场景中的物体进行收敛(小罐头)。通过左边图像的红色通道与右边图像的绿色和蓝色通道获得立体照。两眼间的距离是30厘米,相机的分辨率是1392x1236像素,焦距为7.3毫米。相机和目标对象之间的距离是4到6米。值得注意的是横向和纵向的差距,对于图(c),通过汽车上配备的一个立体摄像机对复杂的现实环境的视轴平行的光流的叠加,获得一个相对的图像序列的快照。相机的分辨率是1392x1040,焦距为6.5毫米,基线是33厘米(http:/pspc.dibe.unige.it/drivsco/)。表示出不同的情况:相对运动(由于汽车的运动)和行人独立的运动的转换(只显示左框架)。图4.1 双目快
40、照本章描述的工作意在模拟机器人主动视觉系统探测在环境中展示和移动的目标,而不是机器人本身的机械运动。特别是,我们能够精准的模拟两个摄像头的运动(如vergence和version)和机器人提供的相关地面真实数据(横向与纵向的差距和双眼运动领域)。因此,我们的虚拟现实工具可用于不同的目的(参见图4.2): 地面真实数据三维立体环境表现双目镜头和眼部运动地面获取数据的数据库计算机视觉模块控制模块L(t)和R(t)图像序列视觉特征与环境之间的相互作用图4.2 提议的主动视觉系统模拟器上图描述了模拟机器人与环境之间的作用,以验证在一个封闭的感知行动循环的视觉处理模块,并获得校准的地面真实数据。4.2.
41、1 立体环境3D场景可以用VRML格式来描述。连同X3D的后续,随着表面颜色、UV映射的纹理、光泽度和透明度,VRML已经被接受为指定三维多边形的顶点和边的一个国际标准。虽然有大量的VRML模版可用,例如在网络上可以找到,但是它们通常没有逼真的纹理,往往只有简单的三维结构特征。为了克服这个问题,在杂乱的实验室条件下,3D场景的数据集通过激光扫描仪被创建。第六节所呈现是使用我们在实验室获得数据集做出的结果。值得注意的是,为了将模拟器作为一个敏捷的测试平台来开发复杂计算机视觉算法,在任何时候复杂的3D VRML模型很容易的被简单的有无纹理的几何图形(立方体、椎体、位面)所取代。4.2.2 显示场景
42、以OpenGL的内容显示在一个屏幕上(更多信息请查阅第5节)。此外,SoOffScreenRender类用于在屏幕外的缓冲区渲染场景,并将立体对序列保存到磁盘上。渲染器可以通过不同角度的摄像产生不同分辨率的立体图像,可以设置以下参数:(1) 相机的分辨率(最大可能的解决方案取决于分辨率的纹理和数量上点的三维模型);(2) 横向与纵向的视野(分别是HFOV和VFOV);(3) 从相机位置到附近裁切平面的距离,也成为平截头体(nearDistance);(4) 从摄像机位置距离远裁切平面在相机的体积距离(farDistance);(5) 摄像机与焦点的距离(focalDistance)。4.2.3
43、 双目镜头和眼球运动本节中所述的视觉系统是能够产生立体图像对的双目镜头在三维空间中移动并且形成的3D点(XF,YF,ZF)。系统的几何形状以及可设置的参数如图4.3所示。图4.3 双目主动视觉系统的几何结构示意图头部的特点是由以下参数表示的(每个表示各自的参照物(XW, YW, ZW):(1) 单眼位置C =(XC, YC, ZC);(2) 鼻方向;(3) 固定点F =(XF, YF ,ZF )。一旦初始化位置固定了,那么不同的运行状况都是可能的:(1) 保持头部(位置和方向)固定,移动眼睛;(2) 改变头的方向,从而模仿脖子上的变动;(3) 改变头的位置和方向,从而创造出更多复杂的模式。从现场探究,相对运动来看,这些情况意味着不同的感知问题。因此,在下面(参见第六节)我们将介绍在不同情况下取得的成果。为清晰和简洁起见,下面我们会考虑位置C = (XC,YC,ZC)和固定头的方向,因此,我们只考虑眼睛的运动。我们将在3.3.1节中讲述不同立体的系统(如转动、倾斜、平移),模拟器可以在所有这些不同的行为间切换。在下面的结果中,我们考虑一种情况,眼睛可以按照任意轴旋,要选择最小的旋转度,使眼视轴从最初的位置到目标位置的旋转(参考3.3.1节的内容)。