We began this work to devise a way for a computer system to communicate effectively by generating visual material. Rather than develop a graphics system that simply visualized objects, we approached this problem as one in communication, addressing similar issues as have been addressed in natural language generation. Our aim was to design a system that generated visual material to achieve communicative intent. Stylistic choices would be made to help achieve the communicative intent and would determine all aspects of the visual material.
We narrowed the scope of this thesis to the automatic generation of 3D illustrations designed to explain the use, maintenance and repair of objects. We felt that research along these lines could enhance a wide variety of applications that currently rely on communication through the use of visual displays. Such applications range from ordinary computer programs to augmented reality systems. Although it might not make sense to integrate our entire system into all these applications, they could all benefit by using at least some of the techniques presented in this thesis.
Our aim has been to present general structures for both an illustration system and visual language as well as modes for different types of interaction. We implemented IBIS as proof-of-concept of our ideas. As we have shown, IBIS is capable of generating illustrations for a number of domains and different applications. For COMET, IBIS generates full shaded color graphics on a high-resolution display. For KARMA, IBIS generates 3D bi-level illustrations that are presented on a see-through head-mounted display and overlaid on the users view of the world. We have also shown that IBIS is capable of supporting different modes of interaction.
During the course of this research we developed:
a methodology and architecture for intent-based illustration
a visual language for intent-based 3D illustration
methods for creating composite intent-based illustrations
interactive intent-based illustrations
We summarize each contribution in the following sections.
Our methodology presents a new approach to the automatic generation of 3D illustrations. First, instead of requiring that the input specify what objects to show and how they should be shown, we use communicative intent as input. Communicative intent describes the desired meaning of the illustration. It describes what it is that we want the user to understand to be what we meant. Communicative intent can specify that we want the user to understand that we wish to convey an objects location. Communicative intent can also specify that we want the user to perform an action on an object in the real world. Effective communication relies on a communicators ability to express something so that it will be correctly interpreted. Our system is designed to compute the communicative value of the illustrations it produces. Returning to the first example, the system determines if communicative intent is successfully achieved by evaluating the illustration to determining if the visual effects that are being used to convey location are successful. In the second example, the system determines if communicative intent is achieved by monitoring the users actions in the real world to determine if the action has been completed.
Our methodology is based on the decomposition of the illustration task in a manner similar to that used in natural language generation. Our architecture was developed to support constant reevaluations and modifications. This generate-and-test approach is designed to reflect the trial-and-error nature of human illustration. The illustration tasks are assigned to different components. This division of labor is reflected by two distinct bodies of knowledge used by each component.
Since visual material comprises of a set of visual effects, we chose to decompose the illustration task into the following subtasks: 1) choosing the overall design of the illustration, or the set of visual effects to use and 2) choosing the style of the illustration, or how each visual effect is achieved. Thus, communicative goals represent communicative intent; style strategies represent visual effects; and illustration procedures represent how visual effects are accomplished. We define design as the mapping of communicative goals to style strategy goals; and style as the mapping of style strategy goals to illustration procedures. Components of type illustrator apply design rules to complete their task; components of type drafter apply style rules to complete their task; the illustration component itself applies procedures to modify and analyze the values that define the computer-generated picture. We have shown that this modularity has many advantages and can be exploited to support interactivity. For example, it facilities the implementation of different backtracking schemes due to the particular situation. During user navigation or when illustrating changing worlds, the drafter reports when an asserted subgoal has been violated, but it is the illustrator who determines whether or not the illustration needs to be redesigned.
The visual language we present consists of communicative goals, style strategies and illustration procedures, design rules and style rules. This language is designed to support the decomposition of the illustration process described above. It is meant to be general, and provides a framework that can used to accomplish illustration tasks we did not address. Two types of rules are used to drive the system. Methods represent how to accomplish goals and thus are used to generate. Evaluators represent how to determine how well a goal is achieved and thus are used to test. We began this project with the set of communicative goals generated by COMETs content planner. We created corresponding methods and evaluators for each of these goals so that the illustrations that the system generated would be comparable to the illustrations found in the technical manuals describing the maintenance and repair of the army radio. When we modified the system for KARMA, we introduced new rules for this interactive application. These were easily combined to represent the tasks involving the use of a laser printer. The system, however, is general.
The communicative goals are general. The rule language is parameterized, including additional parameters that allow the primitives to be specialized for each domain. used additional parameters in the rule language to specialize the primitives for each domain. For example, the goal to show change in an objects state could include additional parameters describing the action that causes the change in state, the agent who performs the action, and so on. The rule language allows us to depict change of state differently if, for instance, the object was pushed, not turned. Even these specializations are general. The rule language also allows object-specific conditions. Rules using these conditions treat objects of a particular class differently. Turning a dial with discrete settings could be treated differently from turning a continuous dial. The style strategies are also general and describe basic illustrative techniques. They, too, can be augmented to create more specialized effects. For example, different arrow styles can be selected to describe different actions such as the circular arrow to show how a screw is loosened. The addition of specialized rules does not restrict the system from finding an adequate solution, because any preference for a specialized rule can be overridden when communicative goals were violated.
Additionally, each representation level is free from the dependencies of the others. Thus, communicative goals are media-independent. The design rules are output-display independent. The style rules are machine-independent, or graphics library independent. The illustration procedures are object-independent. Modifications can be made to any portion of the system without modifying other parts of the system. For example, a new communicative goal can be introduced by creating design rules using existing style strategies. A new style strategy can be introduced by creating style rules using existing illustration procedures. A new object can be introduced without modifying the illustration component. In order to support a new output display, only the portion of the illustration component that utilizes a graphics library needs to be modified.
The communicative goals represent the high-level concepts to be visualized, such as the objects or concepts themselves, spatial relations, properties, change of state, and actions. Designs represent how style strategies are combined to depict high-level concepts. The style strategies are used to generate the illustration-objects that comprise the illustration. They are also used to determine the other values that specify a computer-generated picture. These values are determined to support the decisions concerning how every illustration-object should appear in the illustration. Thus, we do not have a style strategy to specifically choose a view specification. Instead, the view specification is selected as different style methods are applied. Similarly, we do not have a style strategy to color an object. Instead, an illustration-object is assigned a material definition and rendering instructions as the style methods are applied.
The particular primitives we selected allow us to determine how a concept is depicted based on several considerations. For example, a concepts geometric representation is determined by applying style methods for include, context, meta-object, find. How the representation is viewed is determined by applying style methods for visible, recognizable, or by the user-supplied view specification or information provided by the head-tracker. How each representation is positioned and animated is determined by applying style methods for move, label, meta-object, and ghost, and by using the state information, and the information provided by the object trackers. What properties are depicted is determined by applying style methods for recognizable and visual-property, and by using the information in the object representation. What visual effects are used to modify an objects representation is determined by applying style methods for highlight, focus, subdue, ghost, and visible.
All of the decisions that determine these values are driven by the process to achieve communicative intent. Thus the illustration-objects are created to visualize the concepts represented in the communicative goals and the rendering instructions, lighting, and view specification are selected to accomplish the subgoals associated with each of these illustration-objects. Thus, our system is capable of generating all the values that define a computer-generated picture: the view specification, lighting, illustration-objects, and rendering instructions. The system was designed so that every decision helps support the communicative intent and that no decision violates the communicative intent. This functional approach is based on the integrity of the visual language.
Human illustrators create composite illustrations to better achieve communicative intent. We implemented two types of composite illustrations. Series are used to show a sequence of steps, a temporal sequence of events, or a causal relationship. Insets are used to show additional information that cannot be shown in the background illustration. Rules determine when a composite illustration should be generated. In natural language a text-generator can opt to generate compound sentences, separate sentences, paragraphs and so on. Similarly, our system can opt to generate different types of composite illustrations.
Our system generates composite illustrations when one illustration cannot effectively achieve communicative intent. The architecture handles a hierarchy of illustrators, each assigned a different illustration task. These illustrators inherit from parent illustrators and special rules are used to avoid unnecessary computation and to favor solutions that employ a consistent use of visual cues. The illustrator hierarchy further clarifies the division of labor and enables a distributed implementation, while maintaining a cohesive representation of the composite illustration as a whole.
When the system cannot initially design an optimal solution during interactivity, composite illustrations provide a useful mechanism to satisfy violated goals without confusing the user. While, it could be possible to continuously change the way style strategies are accomplished during the course of an animated sequence (such as choosing different highlight colors depending on the objects appearing in the scene) the result would be confusing and annoying. Composite illustrations provide an alternative solution. When the user changes the view specification or when the objects move so that communicative intent is violated, inset illustrations can be generated to achieve the violated goals. Similarly, composite illustrations can be generated in response to a users queries. This mechanism can be used in a wide range of applications, ranging from scientific visualizations, virtual reality systems, to augmented reality systems. For example, an inset illustration can be generated to provide context information when a user is lost.
We implemented four types of interaction to enhance communication, and thus demonstrated how our architecture and methodology supports these different types of interaction.
First, the user is not restricted to the role of a passive viewer. We demonstrated two ways in which users explore the illustrated environment. In the KARMA domain, the user is free to walk about the real world and look in whichever direction he or she pleases. In order to continuously satisfy communicative intent, the graphics overlaid on the users view on the head-mounted display are redesigned. This mechanism is designed to cause the user to achieve goals. When viewing the graphics on a high-resolution graphics display, the COMET user is provided with a simple graphical-interface to navigate in the illustrated environment by changing the values that define the view specification. In order to satisfy communicative intent, the illustration is continuously evaluated and modified to satisfy violated goals, as for instance, the visibility of a certain object.
Second, the communicative intent can change. During the course of an interactive session, the COMET user can ask for additional information. COMETs content planner can consequently assert new communicative goals that are assigned to the graphics generator. IBIS modifies the illustration to satisfy the new goals. The COMET system also determines when an explanation is not effective given the users actions. In this case, a goals importance is increased and IBIS modifies the illustration to better satisfy the goal. The KARMA users actions are used to evaluate a goals success. Once the system determines that the user has completed a step in a task, the corresponding goals are deactivated and the goals associated with the next step are assigned to the illustrator.
Third, the world depicted need not be in a frozen state, but can be continuously changing. In KARMA, the illustration-objects are updated to reflect the movement of the corresponding objects in the real world, thus maintaining a consistency between the users view of the real world and the projected images of the objects on the see-through display. Each change triggers a reevaluation of the illustration to detect and handle violated goals. When objects reach identifiable states, rules may be triggered so that certain concepts are depicted (such as the roll of the dice) or goals may be modified (when an action is completed). This mechanism is useful for augmented reality systems which design communicative material based on the current state of a changing world and the users interaction with it.
Fourth, not only can end users interact with IBIS, but external modules can interact with IBIS as well. The same components that design and maintain illustrations for interactivity can be invoked to evaluate and report different properties of the displayed or planned illustration. An illustrator can report how well any communicative goal is being achieved and which design method is being applied to achieve it. For example, the illustrator can report how well a particular objects location is shown in an illustration. This information can be used by a media-coordinator that determines when it is necessary to reassign goals among the media-generators and relax constraints. Similarly, a drafter can report how well a style strategy is achieved and which style method is being applied to achieve it. For example, the drafter can report how effectively an object is highlighted and that it is being highlighted by changing its lighting. Illustration procedures return information about the viewport, view specification, lighting, illustration-objects, and rendering instructions. Thus, an illustration procedure can return the list of objects that occlude an object. This information is used in COMET to better coordinate the text and graphics by coordinating sentence breaks with picture breaks and by generating cross-references referring to properties of the illustrations such as an objects spatial relationship (the knob on the left), visual cues (the battery in the cutaway), visual properties (the green radio), and so on [Feiner and McKeown 90].
In the following sections we will describe research topics that would enhance visual communication.
As we extended the system, new levels of rules and primitives were developed. For example, the rules that guide the generation of composite illustrations were included with the meta-rules. The illustration task could be decomposed into even more levels to more accurately reflect the different levels of decision making. However, this would complicate the system before some more urgent issues are addressed. It would be prudent to enhance the interaction between the existing components before introducing new ones.
A new class of goals should be developed, however, to better represent general styles. We used a simplistic ordering scheme for representing style preferences. We need to represent the collections of stylistic choices that together are recognizable as one particular style. Then, our system could be used to produce illustrations that are truly in one particular style.
These goals could also be used to generate illustrations that mapped, for example, temperature values to color maps. The advantage of such an approach would be that the color mapping would be adaptive to the particular situation. Thus a color mapping scheme would be incrementally created to ensure that the different temperature values were recognizable. This would require new forms of evaluation that integrated rules about color interaction and perception as well as the color mapping technique.
New rules could be introduced to determine when the generation process should continue, even when all communicative goals have been achieved, to achieve goals of aesthetics and style. We showed one example of such a rule: the rule for consistent views. Other rules could be used to post-process the components of a composite illustration to ensure that, for example, arrow-style was consistent.
However, other rules are needed to examine the illustration as a whole to determine if the many special effects are confusing (imagine an illustration with many highlighted objects) and cluttered (an illustration with several arrows and labels). Thus the design rules should also incorporate general knowledge about composition and aesthetics like that found in [Tufte 83], [Tufte 90], and [Holmes 91]. These rules would specify general properties to avoid.
While the set of communicative goals can be easily used to describe simple operations and relations, they do not reflect the wide range of human expression. Thus, our representation for communicative intent is incomplete. We need to be able to specify emotion, urgency, as well as represent the interrelation between the different parts of the communication. This is a challenging area of research.
Research efforts in natural language generation have shaped the way we represent content in a way that is perhaps too closely linked to verbal language and the needs of a natural language generation system. For example, problems arise when the content plan refers to visual attributes. It is unclear, when a communicative goal specifies the property color whether it is referring to objects color in default lighting, a general class of color, or the color value of the object as it appears in the illustrated 3D scene. It would be interesting to determine how much of a content plan is designed for text generation and which new concepts were added in just for graphics. It would be challenging to take a step back and come up with representation schemes that are truly medium-independent, yet disambiguate the concepts because they have potentially different meanings when they are communicated by different media.
A challenging area of research is to create new models for interaction with visual material. Often user interaction is limited to traditional input devices (mouse, keyboard) that are used to control simple graphical devices and enter commands. The means for a user to interact with visual material should not simply emulate the traditional modes for dialogue with computer programs. We have shown how the users actions can be used as input. Other devices can be used to better monitor the users responses and actions, providing data that can be used to enrich the systems response to the user. This information can be used to build new models for dialogue used by more adaptive and intelligent systems.
For example, the user can be equipped with an eye-tracking device, so that the system can monitor how the user scans or looks at the illustration. Since our methodology maintains a description of the relationship between areas of the illustration and what is being communicated in the three levels that define our visual language, new methods can be developed to judge the effectiveness of the presentation. These mechanisms could be used by a learning module that determines, on a user by user basis, what visual cues are more effective. A style preference could be developed for each user based on his or her response to the visual cues being used. For example, it would be relatively straightforward to determine how long it takes for a user to look at a highlighted object. The system could use this information to rate the various highlight methods in terms of effectiveness for a particular user.
In KARMA, the evaluations provided by the illustration system can be used to judge the effectiveness of the presentation generated by other media-generators. If the system generates spoken text, the illustration system can return information which can be used to determine if further explanation is needed. For example, the user may be told to check a gauge before performing an action. The system can determine if the user turned his or her head to look at the gauge before proceeding to turn the dial. This information can be used to determine if it is necessary to issue further warnings.
The evaluation procedures can be modified to adapt to different users. A simple, but important, enhancement would be to record the users distance from the graphics display and use this information (along with the characteristics of the display) to better calculate objects legibility. Currently, IBIS assumes that the user is at a hardwired distance from the display when computing legibility. Similarly, we could introduce new evaluators to compute, for example, the contrast between objects deigned for color-blind users.
We can also represent a users level of expertise or interest and introduce new rules that determine how objects should be represented. In natural language the user model both determines the content of the text generated (how much information needs to be packed into the explanation) and the surface (what terminology is used or how descriptive the phrases describing an object need to be) [Paris 93]. Similarly, we can represent the presentation tools with which the user is most familiar. For example, an electrical engineer can be assumed to be familiar with the conventions used in circuit diagrams. A mechanical engineer can be assumed to be familiar with the conventions of technical drawing. This information can be used to select the style of the presentation.
Previous discourse should also influence the methods applied to satisfy communicative intent. Currently we maintain a history of the methods that have been applied in a session. These methods are reordered for each new illustration so that methods that have been used before are preferred. Similarly, we could include in a user model a record of the information he or she has been presented with. This could be used to relax the thresholds for the location and recognizability goals, similar to how APEX ceases to provide extra context information once an object has been introduced.
We have applied simple rules for layout. For example, inset illustrations are positioned over the objects in the background illustration. Currently, we can detect when this positioning violates visibility goals and use this information to reposition inset illustrations. While the arrangement of the non-overlapping components of a composite illustration is handled by a layout manager (as in COMET), it is unclear which component should lay out the inset illustrations. Currently, IBIS does so, but it probably should not. If the layout manager is enhanced to really mix the text and graphics and output from other media-generators (speech, audio) and even typesets the text over the graphics (or vice versa), then it must take on the responsibility of using knowledge about the different media and how they should be combined. The graphics system will not be able to easily determine whether or not goals have been violated when, for example, a paragraph is typeset in a corner of the illustration. The layout manager should also incorporate rules for layout, such as those found in [Koren and Meckler 89] for composing multimedia material. Thus, the layout manager would be elevated to the role of multimedia-presentation-generator, requiring a body of design and style rules of its own as well as a set of complex multimedia-specific evaluators. What is interesting is that these evaluators will be based (and call) on each generators evaluation capabilities.
We need to explore the types of false implications present in 3D illustrations and create evaluators to detect them. For instance, many knowledge-based systems position objects. However, the positioning of an object can lead to false implicatures. Consider the problem presented by the use of distinguishable locations. Suppose that we wish to communicate that a traveler will find his or her suitcase on the conveyer belt in the luggage area in an airport. An illustration of the conveyor belt with a suitcase on it may suffice, but the exact location of the suitcase is unknown. Care must be taken to choose a location on the conveyer belt that does not introduce false implicatures. A distinguishable location, such as the center of the conveyer belt, could be interpreted to falsely imply that the suitcase can be found only in the middle of the belt; while a non-distinguishable location may be correctly interpreted to be random as shown in Figure 8-1. Similarly, alignment of certain objects may imply a false relationship between the objects [Marks 90].
Distinguishable point:
center
Distinguishable point:
top left
Indistinguishable point
Figure 8-1.
Distinguishable and indistinguishable positions on a luggage carousel.
An interesting and straightforward task would be to enhance communication systems so that they can describe themselves. Thus, a graphics generator could illustrate itself, especially when it is active. This would be useful for debugging and tracing rule activations. The visual presentation should show the multiple representations of the illustration being designed corresponding to the three levels of representation employed by the system. The visual presentation could also show the relationship between each object and the objects in the object-base.
Human illustrators use different media to express different things. We need to better understand the what different types of media such as pen and ink, pencil, air-brush, charcoal, watercolor, or even different types of cross-hatching styles are better at expressing. A challenging research direction is to represent the expressive qualities for different media, representing what each is good at communicating and the computer graphics rendering techniques that emulate that media. Intermediate results along these lines could be used to produce illustrations that really mix different styles. For example, an object can be rendered so one part of it is appears in fine detail but the rest of the object is depicted with very little detail. This effect can be accomplished by rendering the detailed area in a style that emulates a finely detailed air-brushed ink drawing, while the other portions can be rendered as though it was drawn with broad charcoal strokes. As the number of tools for transforming images so that they appear as though they were created using different media grows, there appears to be little effort to provide users with insights why it is to their advantage to use these in different situations.
Unfortunately, our most difficult (and time-consuming) task was to develop the illustration procedures to both produce the visual effects and analyze them. While we would have liked to have more time and support to create and combine different illustrative techniques we were hindered by the lack of tools to emulate even simple effects. We settled for solutions that compromised accuracy for speed. The issue of processing time gains greater importance when the system is used interactively. This problem points to a genuine need for object-oriented graphics systems with firmware support that 1) do not divorce the rendered computer-generated picture from the instructions that specify it and 2) provide a much richer set of capabilities. While graphics systems are optimized to render polygons at blazing speeds, they are barely capable of performing the simple tasks our methods and evaluators required within our time constraints.
The limited functionality provided by standard graphics libraries presented the greatest obstacles encountered during the course of this work. Our system relies on procedures that analyze the illustration. Object-oriented graphics libraries and graphics firmware should be enhanced to provide a number of analysis routines based on the interaction of various objects and elements in a rendered graphic. The set of style strategies presented in this thesis are a good starting point. They could be treated as constraints during rendering and associated with objects in a display-list. For example, there should be procedures to automatically classify the objects in a scene for occlusion and to segment lines and surfaces based on levels of occlusion. These routines would be used to automatically generate cutaway views and other visual effects based on the visibility constraints associated with each object like the transparent views generated by the Hyper-Renderer [Emhardt and Strothotte 92].
Graphics systems should have visibility and recognizability constraint capabilities. Each object in a display list could be enhanced to include the constraints associated with the way it should be viewed. Before rendering, the system could tag objects and constraints that are violated so that the system has the opportunity to modify the illustration.
It will still be a very difficult task to describe these constraints. How, for example, should visibility be measured? We computed the visibility of an object using different techniques. The most efficient technique simply reported what objects occluded a particular object. The most extensive computed the area of the object that was occluded. But, different objects require different visibility constraints. Metrics need to be devised to determine at what point the occlusion is truly detrimental to the objects visibility; using straight percentages does not suffice. In many cases, objects are visible, even if only a small part of them can be seen. For example, a view of a screws head makes the screw visible. Other situations are more difficult to analyze. A house that is seen through the leaves of a tree may be visible, even if most of it is occluded by the tree. A yard seen through the slat of a picket fence may be visible, while the rake on the ground is not. The road seen through a rain spotted windshield is visible, but the bicyclist is not. And the person, sleeping under the covers, is visible to the person looking at the bed, even though the person is completely occluded.
The analysis of an objects visibility depends on a great number of factors, ranging from the type of object, its state, to the values that define the current situation. Similarly, what makes an object recognizable to a particular person is very complex. This line of research should be coupled with efforts in cognitive science, semiotics, anthropology, computer vision, and others.
Many graphics packages provide a highlight tag that can be set for selected objects in a display-list to cause them to rendered in a highlight style. However, the highlighting effect is accomplished in a fixed manner, usually by either changing line color, blinking, or lightening a color value in a predetermined manner. We have shown that the effectiveness of a technique as fundamental as highlighting is still based on all the properties of the picture. The tests that need to be performed to evaluate if the current technique for highlighting is effective should be provided by the graphics library. For example, routines are needed to quickly determine what pixels are associated with an object. They could be used to determine the contrast between adjacent objects. Techniques for highlighting, such as brightening an area of the graphics picture covering the object, could then be accomplished efficiently during rendering using alpha-blending. Thus the primitive highlight should be adaptive, given the properties of the picture and the object that is to be highlighted.
The include knowledge should be enhanced to be context-sensitive. For example, if the intent is to communicate the shape of a dial, it is not appropriate to depict the labels that identify its settings and use. But if the intent is to communicate the semantics of that same dial, then it is appropriate to depict its labels. Specifying these constraints is challenging to the designers of the object representations.
The object-representation should also be enhanced so that alternative representations can be selected. There are many cases in which one way of depicting an object is more suitable than another. In our example of the dice in Chapter 1 we demonstrated how a concept-specific rule is applied to show the roll of the dice. That rule could be augmented to specify that a 2D depiction of the top faces of the dice should be used instead. Similarly, the pieces on a chess board could be represented symbolically when the intent is to show the current state of the game. Adding the methods for generating the alternative representation would not be very difficult. However, devising the methods and evaluators to determine when they should be used is difficult.
Better rules for selecting a view specification should be supported. Hagen [Hagen 91] describes useful rules for generating ecologically valid 3D interfaces for application of computer generated visual presentations. These rules are based on the well-founded rules used in Western art from as early as the 15th century. Ecological validity is defined as the mimicking the characteristics of the ordinary environment and the ordinary conditions of view of that environment. The rules of perspective concern the presence of one horizon, the treatment of edges parallel to the ground, the use of the ground plane, viewpoint distance, etc.. These rules are not automatically satisfied when using standard view transformations. The objects in the scene require processing to produce ecologically valid presentations.
We implemented composite illustrations to handle overly constrained illustrations. However, there are many different types of composite illustrations, criteria for their use, and rules for their layout. Each type of composite illustration serves a particular purpose and can be selected to solve particular communicative goals. For example, a split image is commonly used to show the same object from two different angles. Similarly, while it may be possible to show two very distant objects in the same illustration, it may be more appropriate to show each object in a separate illustration.
We implemented rules for only insets and series. Each type of composite illustration should be supported by new rules for partitioning communicative goals and assigning them to the various illustrators in the illustrator hierarchy. When an illustrator fails, composite illustration rules could determine, based what goal failed, what type of composite illustration to generate. Consider the case when the communicative intent specifies three location goals, for objects a, b, and c. The context object for a is b and the context object for b is c. The relationship between these objects and goals could be detected before the illustrator begins to address the first location goal. Thus, the system could opt, from the onset, to generate a successive locator composite illustration.
[Meyer 87] defines conformant classes to be classes of objects that are perceived to be the same. For instance, a person hearing two tones may mistakenly perceive them to be of the same pitch. While heard one right after the other the same person will correct perceive that they are different. Similarly, two colored areas, if separated by another area, may appear to be of the same color value, although they are not [Albers 63]. If these same two colored areas are adjacent, then the difference in color value is apparent at their boundary. Similarly, the visual cues seen over the course of an session may be erroneously interpreted to be the same or different. A conformant class is a set of objects that are perceived to share the value of some property.
Visual effects depend on the perception of similarities and differences. Techniques for determining an objects membership to a conformant class can be used in the illustration procedures for evaluation. As different aspects of the illustration are modified, as it is designed or during interaction, the membership to various conformant classes also change. For example, a tapered line may appear to be not to be tapered depending upon the resolutions and screen space it occupies. Cuboids and spheres may appear to have the same shape if displayed with low resolution. Conformant classes would classify the perceivable differences in line weight, color value, shape and so on. Each conformant class would be represented by procedures to determine membership in that class. These procedures could be incorporated in a graphics library.
Other rules can be incorporated to manage color interaction. For instance, the rules in [Albers 63], [Imhof 65], and [Itten 73] can be incorporated to better select highlight colors or the color for meta-objects.
Human-computer interaction depends on effective communication. As systems become more complex, presenting users with multimodal multimedia streams of information and interactive capabilities, so grows the need for finding ways to communicate with users in coherent, comprehensible manners. Even in the field of multimedia communication it is sometimes easy to forget that the systems being described have anything to do with people.
There is much that we do not know about human communication. But, most of us know much less about visual communication than we know about verbal communication. Nonetheless, that does not prevent us from trying to model forms of visual communication. In this thesis we demonstrated that a functional approach can produce promising results. While the visual language cannot, of course, be used to express the full range of visual communication, it is sufficient for a wide variety of domains and applications. Furthermore, its structure provides a manageable model for the complex interrelationships between the various elements that make up a 3D illustration.
There is also much we do not know about how humans interpret communicative material. Again, that does not prevent us from trying to approximate rudimentary rules of interpretation. We showed two ways we can attempt to approximate how a user will interpret a 3D illustration. First, we represented the conditions on which communicative conventions depend, using evaluators to determine how well visual effects were accomplished. Using these evaluators, our system was able to generate illustrations similar to those found in technical manuals. Second, we represented the desired results of communication in terms of events. Using some simple procedures, our system was able to monitor certain events.
Communication is an exchange of human thoughts and feelings. Technology has helped people overcome the barriers of time and space. But, in their primitive states, these technologies erected new barriers of their own. Each time we scaled one, another one appeared. Its time for computers to hear us in our languages, and figure out what it is we are saying. We have approached our first canvas, brush in hand, with this goal. With broad strokes we laid down the foundations of a plan and then took with us what we could find up a ladder and started to work. Our first painting will be pointillist, for at most we can each produce a dot. But our project depends on cooperation and synchronization. Every so often, we will descend from our ladders, stepping back to evaluate our collective work. Some of us will return to our places, while others search for another brush, another color. And sometimes, when the flickering shadows catch our attention, we will stop to consider the people beyond the windows.