Visually-Guided Grasping and Manipulation

Contact: Bill Yoshimi <yoshimi@cs.columbia.edu>

Human experience provides an existence proof for the ability of vision to assist in grasping and manipulation tasks. Vision can provide rich knowledge about the spatial arrangements (i.e. geometry and topology) of objects to be manipulated as well as knowledge about the means of manipulation, which in our case are the fingers of a robotic hand. Our goal is to visually monitor and control the fingers of a robotic hand as it performs grasping and manipulation tasks. Our motivation for this is the general lack of accurate and fast feedback from most robotic hands. Many grippers lack sensing, particularly at the contact points with objects, and rely on open loop control to perform grasping and manipulation tasks. Vision is an inexpensive and effective method to provide the necessary feedback and monitoring for these tasks. Using a vision system, a simple uninstrumented gripper/hand can become a precision device capable of position and possibly even force control.

This research is aimed at using vision to provide the compliance and robustness which assembly operations require without the need for extensive analysis of the physics of grasping or a detailed knowledge of the environment to control a complicated grasping and manipulation task. Using vision, we gain an understanding of the spatial arrangement of objects in the environment without disturbing the environment, and can provide a means for providing robust feedback for a robot control loop. Our previous work has integrated both vision and touch for object recognition tasks. We would like to extend our object tracking system so that it can be used to provide visual feedback for locating the positions of fingers and objects to be manipulated, as well as the relative relationships between them. This visual analysis can be used to control grasping systems in a number of manipulation tasks where finger contact, object movement, and task completion need to be monitored and controlled. We also want to close a visual feedback loop around the forces from the Barrett hand system being proposed. By visually tracking the fingers and reading forces, we can accurately position the hand and monitor the status of grasps of objects. This research area is very rich, and relatively little work has been done in this area (a major impediment is acquiring an actual robotic hand since they are difficult to build and expensive to purchase). Below, we outline some aspects of visual control that are well suited to the grasping problem:

Visually determining grasp points. This is usually a preliminary step before actual grasping takes place, and may not be as time critical as the manipulation task itself.
Vision can be very important in dealing with unstructured and moving environments, where model-based knowledge may be unavailable or errorful. This is an example of the active vision paradigm.
Once a grasp has been effected, vision can monitor the grasp for stability. By perturbing the fingers, we can measure the amount of displacement and types of displacement in image space of the object. If the object does not move correctly, we can say that the grasp is faulty.
Visually monitoring a task will give us the feedback necessary both to perform the task as well as to gauge how well the robot performed the task, or if an error has occurred.

While visual control of grasping can be very helpful, we need to recognize some problems associated with it. The problems listed below need to be adequately addressed in order to successfully control grasping using vision, and are at the crux of why this is a difficult robotics problem.

Grasping and manipulation need real-time sensory feedback. Vision systems may not be able to provide the necessary analysis of the image and computation of an actuator movement fast enough.
In grasping with a robotic hand, multiple fingers need to be employed. This entails having the vision system follow multiple moving objects in addition to the possible movement of any object to be manipulated.
Grasping and manipulation usually require 3-D analysis of relative relationships of fingers and objects. Simple vision systems only provide a 2-D projection of the scene.
As fingers close in on an object to be manipulated, visual occlusion of both the object and fingers can easily occur.
An important component of most grasping tasks is the knowledge of forces exerted by fingers. Vision systems can not directly compute accurate force measurements.

Our current research is aimed at finding solutions to these problems. For some tasks, it may be that visual feedback can only be a supplement to contact/force sensing.

Some sample results using snakes for tracking.

Single finger tracking

Figure 1: a. Block to be grasped and finger of hand, b. Computed snakes for block and finger, c. Snakes overlaid on image.

Tracking a finger while hand performs a peg extraction task.

Figure 2: a. Initial configuration of gripper and bolt to be extracted, b. Snakes used for tracking fingers and bolt and c. Gripper and bolt after bolt is extracted.

Click for an mpeg of the Toshiba gripper performing a bolt screwing task and for an example of real time line extraction. (NOTE: the blue and red line segments in the real time line extraction segment are artifacts of the mpeg encoding, the original video is in black and white.) WARNING!!! the video is 3,812,928 bytes long so you may have to wait a while...

Click for a description of snakes.

Recent publications on this research

Billibon H. Yoshimi and Peter K. Allen. Visual Control of Grasping and Manipulation Tasks, In 1994 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems Las Vegas, NV, Oct 2-5, 1994.

This paper discusses the problem of visual control of grasping. We have implemented an object tracking system that can be used to provide visual feedback for locating the positions of fingers and objects to be manipulated, as well as the relative relationships of them. This visual analysis can be used to control open loop grasping systems in a number of manipulation tasks where finger contact, object movement, and task completion need to be monitored and controlled.

Return to the Robotics Lab home page