|NDT.net May 2004 Vol. 9 No.05|
| CT-IP 2003 Proceedings
On Human Perception of 3D ImagesKurt Osterloh, Bernhard Redmer, Uwe Ewert, BAM, Berlin, Germany
AbstractThree-dimensional data resulting e.g. from tomography usually are presented as two-dimensional images to make them accessible to human mind. The way of human perception of these images entails more than optical sensing on the retina of the eye. Shape, patterns, colour and motion are simultaneously processed and analysed in the brain. The imagination of space from a two-dimensional canvas presentation can be generated by different ways, by shadows, partial occlusion, angular evaluation and binocular viewing . However, deceptiveness is inherent with any spatial presentation in two dimensions. Alternatively to sophistic devices mimicking binocular viewing, the lack of the third dimension may be somewhat compensated by taking advantage of the time domain in serial presentations of single scenes resembling a film. This is practically achieved either by series of sections through the virtual object or by twisting and turning around a (re)constructed body.
Introduction to visual perceptionTomographic as well as laminographic methods result in sets of three-dimensional data which have to be presented to the human mind. This means visual presentation of spatial objects usually on two-dimensional media. On the other side, optical detection on the retina of the eye is only the first step of the perception process. Knowledge of subsequent steps is largely gained from well documented pathological states of brain dysfunction and lesions apparent as confined black-outs in the visual field or as forms of agnosia. In addition, the visual sensation needs a fine tuned neural control to protect the mind from getting confused by a flood of flickering sequences of single scenes mainly due to the motion of the individuum. The organisa-tion of the perception process and its allocation to anatomical structures is schematically shown in Fig. 1.
Transforming light into perceivable signals in the eye is the first step in perception. Details pertaining to the anatomy and physiology of the eye can be found in textbooks internet presentations (/1/ - /5/). Some basic facts such as retinal structure and function are relevant for understanding the perception process. In particular, one should be aware that separate structures collect pictorial details (cones that are also colour sensitive) and rapid changes in contrast due to moving objects (rods that are highly sensitive but insensitive to colours). First contrast enhancement is already achieved within the retina by lateral junctions. Similarly, some receptor cells (rods) are grouped together in order to increase sensitivity and to analyse fast changes due to moving objects. As a consequence, multiple parallel information on patterns (including colours) and motion is transferred to the next instance within the thalamus. Here, in the anatomical structure of the lateral geniculate body, six image copies alternating from each eye are stored in separate layers. The pertaining anatomical structure here is sketched as a stratified box. In real anatomy these layers are bent like a knee (geniculate). However, the function is clearly stratified. Images of the first two (bottom) layers are crudely structured but highly sensitive to changes. In technical terms, they have a low spatial resolution and fast reaction times upon changes what makes them predestined for motion analysis. Inversely, the upper layers entail images with a better resolution preferentially for resolving fine structures. Another problem tackled at this site is that eyes are constantly in motion in difference to a camera on a tripod. As a consequence, the optical image on the retina is permanently flickering around. If we would be aware of this it would drive us crazy. All these shifts are compensated by feed-back from the control of concurrent eye and body motions. This is achieved by multiple links to the active motion control centres. As a result, we perceive our environment as static.
Having passed the thalamus, already historically regarded as something like the gateway to the mind, the visual information is transferred to the primary visual cortex (V1 or area 17). Neurones from the thalamus are interlaced at this site so the images of the separate eyes are overlaid. As a consequence, disparities due to binocular viewing are becoming evident forming a prominent base for spatial perception. Here we also find structures sensitive to preferential directions. Obviously, image analysis and interpretation starts from here. Subsequent cortical areas (V2 ... or area 18 etc.) process in parallel image structures and motion. Structural information is compared with memorised patterns for further identification. It is assumed that perception creates a Gestalt which is an abstract form or figure rather than a presentation of specific image data. This could enable us to recognise and identify e.g. a person by his or her face. Dysfunction of the pertaining anatomical structure inevitably leads to prosopagnosia, i.e. the inability to identify a person though the face as such is obviously realised by pointing at eyes, nose, mouth etc. Sensations of motion are processed in parallel and provide stimuli to change our focus of attention. If we become aware of an moving object we may redirect our viewing. Finally, human perception is linked to different anatomical and physiological structures in charge of optimising a received static image, analysing separately structural patterns and motion, overlaying binocular images, noticing prevailing directions and associating the received information with the content of memory.
Spatial perception principally is based on three totally different fundaments: on understanding of perspectives in graphs, on binocular viewing and on employing the time domain by collecting successively impressions from different aspects. These principles are also linked to separated physiological structures: Evaluating perspectives is obviously a cognitive process so it is allocated functionally behind the primary visual cortex, but it certainly needs information on directions elaborated there. Binocular vision is definitely a subject to the primary visual cortex where the images of the two eyes are merging. Time courses are processed in parallel to pattern and colour recognition from the retinas down to the pertaining cortical areas but by different kinds of neural cells. All three principles contribute to create the imagination of a three-dimensional space. In the real world visual sensation is the primary tool for orientation in our environment. Concurrently we use the same tool to comprehend abstract data as they may originate from tomographic techniques via optical visualisation. As a consequence, the images created for this purpose should resemble the natural environment in a way how we are used to perceive it in our mind. Addressing all three principles simultaneously will be technically hardly possible nor will be it economical in most cases. Spatial imagination can be created sufficiently by utilising only one or two of them. How this can be achieved, this will be shown below.
Planar images producing spatial imaginationAt all times artists were concerned about illustrating their perception of the spatial environment on a canvas with two dimensions only. In doing so they encountered the difficulty to reduce the number of dimensions. The solution was to emphasise all pictorial details that evoke spatial illusion such as light and shadow or careful arrangement of details. Naturally distant objects are partially hidden behind those in the front of the scene. This partial occlusion of objets in the background still must allow their perceptibility. Fig. 2 presents the first approach how shading could produce an impression of an embossed structure. It uses the sensation of directions together with our experience on the course of light and shadow. This kind of images are frequently generated in radiology by filtering to enhance details. However, the resulting relief like structures do not have anything in common with the original specimen.
Incomplete contours are easily completed by the visual mind as shown in Fig. 3. This ability may help to understand the arrangements of solid black and white blocks in the upper part of Fig. 4. In radiography all specimens become transparent which does not necessarily support their perceptibility as shown in the lower part of the same figure. The black and white blocks were made of transparent PMMA covered with black and white tape, respectively. Having removed these tapes the image appears somewhat more confusing (at least to some of us). The only indication how they are arranged can be found at the bottom of the blocks by recognising the perspective arrangement of the edges.
Probably the most effective tool to generate a spatial impression on a canvas is to carefully reconstruct perspectives introducing one or more vanishing points as shown in Fig. 5. Fundamental principle is the fact that distant object simply appear smaller than those near by. As a consequence distance can be (roughly) estimated by the height in the field. The term for this approach is angular evaluation. In this context it should be emphasised that estimation of lengths is anisotropic. Height is always overestimated as compared to width. For the sake of human orientation, this makes sense. Falling down one meters may hurt in difference to a leap the same distance. In radiography (Fig. 6 a) object close to the detector plane appear smaller than those located more distantly. The latter ones, in addition, appear more blurred due to the geometrical unsharpness caused by the focal spot size. On the other hand, human tend to interpret blurred contours as further away than sharp ones because details are recognised easier in objects closed by than in those in some distance, independent from their size. Presenting figures of the same size but one sharp and the other one blurred pretends to an observer that the latter one is located further off than the former one (Fig. 6 b and c). In paintings details are found precisely presented in the foreground but blurred in the background. Finally perspective illustrations are able to produce impossible figures (/6/) such as the magic triangle shown in Fig. 7 which, in fact, is a real object (/7/). As a consequence, one has to be aware that perspective presentations have a certain potential to produce false illusions.
Binocular viewingViewing with two eyes is a fundamental source of spatial perception. Focussing an object in a given distance requires both, optical adaptation of the lens and a conver-gent adjustment of the visual directions of both eyes. These mechanisms works well for distance estimations within the reach of the hands. At further distances the disparate images of the two eyes are mentally transformed to a spatial impression. The nature of disparity is explained in Fig. 8. All points in space projected onto corresponding spots on the retina are located on a circle, the so-called horopter. All points outside the horopter are projected onto disparate spots on the retinal surface. The resulting images (oculocentric visual directions) are overlaid and processed in the primary visual cortex producing a single view (egocentric visual direction) where all disparate details are translated to spatial depth. In this context, it should be noted that the retina in the eye as the optical receptor has a spherical shape in difference to the flat photographic plate (film layer) in a camera.
Emulating binocular viewing inevitably needs presentation of disparate images to each eye separately. This can be achieved either by refracting and/or reflecting the oculocentric visual directions with the aid of suitable optics or by a colour coded presentation of overlaid images (Fig. 9). In the latter case, disparate patterns are coded e.g. red for one eye and green for the other one. Viewing these images needs red/green spectacles that filters the images pertaining to the respective eye. A more contemporary approach is using an electronic display showing the two images alternating together with shutter spectacles closing by turns one eye. Synchronisation can be achieved by infrared links avoiding hassle with connecting cables. In any case, emulating binocular viewing essentially needs optical devices. On the other hand, in contrast to tomography a spatial image can be generated already with two projections resembling the binocular visual directions. The choice of presentation depends on the object and the application purpose. In some cases, if appropriate viewing devices are available, stereoscopy with two projections may be sufficient. In other circumstances, if a complex object body needs to be understood from multiple viewing aspects, a full blown tomography may be necessary instead. This would allow reliable and precise quantitative length measurements in any direction hardly possible with other presentation methods.
Time domainSince instantaneous viewing a single scenery either may result in erroneous spatial impressions or needs sophisticated optical devices, another way of perception should be considered, i.e. the serial presentation of a row of pictures e.g. in a video clip. In other words, to reveal information hidden in the third dimension by utilising the time domain. Confining strictly to planar illustration, a stratified presentation section by section is simple, straight forward and nevertheless sufficient to generate a spatial imagination of a three-dimensional object body. Another approach is to place (in reality or virtually) the object onto a turntable and leave it rotating like a carousel. This is the preferential method to present flat structures that are fairly apparent in one direction but scarcely visible in another one.
A certain visual phenomenon has been rarely appreciated, that is the ability to determine e.g. the direction of a rotating object. Having a point fixed in space, it makes a difference in perception if an object passes in front of or behind the fixation point (see Fig. 10). In the first case, it crosses the visual direction of the right eye first and then that of the left eye if it moves from left to right. In the other case, the axis of the right eye is hit first and then the that one of the left eye. These two different time shifts make the impression of passing closer than or further off the fixation point. This effect can be proven with a simple experiment based on the fact that recognition time is intensity dependent. It simply consists of watching a pendulum swinging in a plane perpendicular to the visual direction. It is trivial to see the pendulum not leaving the swinging plane. However, it makes a difference if one eye is shaded with a grey filter (or a sun glass). The lower light intensity makes a retarded perception. As a consequence, the pendulum apparently crosses the axes of the uncovered eye first and then that one with the filtering glass. For the viewer, the pendulum appears to leave the swinging plane and to rotate on an elliptic orbit. This observation has been published by Pulfrich in 1922 (/8/) and hence known as Pulfrichs pendulum (/9/, /10/). As a result, the time domain contributes to the visual perception of space. Physiologically, this makes sense since our environment is not static at all.
ConclusionVisual perception is the primary human interface with the environment. Hence, it is our most valuable tool for orientation. The pathway of vision can also being utilised for comprehending abstract data derived from spatial scanning an object like in tomography. Prerequisite is an appropriate visual presentation perceivable as a spatial image. Choosing a canvas like medium means reduction of the three spatial dimension down to those two present in the plane. Pictorial characteristics such as light and shadow, illusory contours and perspectives, are essential for establishing spatial perception but have the potential to generate erroneous illusions as demonstrable by purposely misleading images. Since both, the perception of time series producing spatial imagination and the abstraction forming a Gestalt contribute to the understanding of space the time domain may be also considered as a tool for the percep-tion of information in three dimensions. This can be achieved either by consecutive presentations of cut layers or by video presentation of a tumbling object. Contemporary hard- and software development may certainly favour this kind of presentation of abstract spatial objects. Layer thickness determinations converted to images with partial occlusions (e.g. /11/) may even facilitate easier comprehensibility and improve forming a Gestalt.