Studies on development of stereo vision system necessary for mechanization of apple fruits harvesting
2002
Takahashi, T. (Hirosaki Univ., Aomori (Japan). Faculty of Agriculture and Life Science)
An apple harvesting machine must have the ability to discriminate an apple from its surroundings, to know the apple's relative location, and to measure the coordinate values of three dimensions at the base point of the harvesting machine. Binocular stereo vision is an available measuring method to obtain such 3-D information at an outdoor field. However, it is very difficult to apply this system to an automated machine because practical solutions to the correspondence problem of stereopsis have not been found. The purpose of this study is to establish a practical measurement method of binocular stereo vision for obtaining 3-D information regarding fruit and their surroundings, and to develop a stereo vision measurement system that can be applied to the mechanization of automatic fruit harvesting Four main subjects are addressed in this study: (a) the mechanical characteristics of distance measurement using a binocular stereo vision system, (b) the establishment of a new principle for distance measurement in the developed vision system, (c) the analysis of and solution to the correspondence problem, and (d) how the main factors of input images influence accuracy of measurement. Moreover, the application of this study method was considered in regard to measuring fruit shape and in regard to controlling the location of a manipulator for harvesting. (1) A trial system of binocular stereo vision Measurement using binocular stereo vision is achieved by processing a set of images obtained by two cameras. The trial stereo vision system for this study consisted of two CCD color video cameras, a note-type personal computer, a PC card for video capture, a camera table that allows pan and tilt, and changeover switching box for input images. The cameras had a CCD surface of 380,000 pixels, and a NTSC-type signal output. Their operation was controlled by the computer. When automatically harvesting fruit in an orchard, it is necessary to search and discriminate individual fruit by processing a set of images. Therefore, the cameras were equipped with a pan-tilt mechanism, a zoom mechanism, and a mechanism for their control in order to take photographs of an individual fruit with a desirable size at a central position through the trial system. Operation of the cameras' control mechanism was carried out using the computer manually or automatically by means of a program with a VISCA network through an RS-232C interface. Accuracy of distance measurement on the trial system is influenced by characteristics of taking photographs and distance performance at each camera lens system, and the condition of an optical system of stereo vision. The whole system's accuracy will fall rapidly even if there is a small error on the optical stereo system, with the flaws appearing even more significantly at long distances. Therefore, calibration of the trial system's hardware was carefully carried out in the following ways : First, the characteristic of left and right cameras were examined individually in regard to their distance and zoom values. By performing regression analysis on the results, these relations and factors were determined : a revision equation of lens aberration, a relationship between the cameras' focal lengths and focus distances, a relationship between zoom value and focal length, an equation for the center of a virtual lens, and a scale factor of monitor size in regard to the size of the camera images. Next, distance equations of the optical stereo system were calibrated using the results of both cameras. The results showed a distance error in the range of 2% at a distance of 1 m to 4.5m. (2) Development of a measurement method by composing left and rights images A conventional principle involved in measuring distance using binocular stereo vision is to fix an single object from a left and right viewpoint, and to calculate distance based on triangulation principle. If the object cannot be made to correspond to both left and right images, the distance between it and the viewer cannot be calculated. In general, Marr's three restrictions and an epipolar constraint are used for solving the correspondence problem, but they insufficient. In the present study, a new principle of measurement using binocular stereo vision was established in light of optical physics in regard to stereo vision and optic nerve physiology and psychology. The principle is based on the fact that when a set of left and right images overlaps at each cross section of space in the direction of depth, an object's image becomes clear if the object intersects their sections. Namely, a cross section of a search space in the direction of depth is determined by gazing, and a composite image of left and right images is made on the section. If clarity was detected on the section, and an object exists on it, then information regarding its position and distance is obtained. A composite image was made and arranged using a method that alternatively selected its even lines of the left image and odd lines of the right image, respectively. If the left and right images of an object overlapped well, then the color difference between the horizontal lines becomes minimum, and the clarity of a color composite image increases. As the difference between the position of the left image and that of the right image increases, color differences of the horizontal lines increase, and the clarity of the composite image decreases. An index of clarity on each pixel of a composite image is described by variance of the three primary colors, RGB values. The variance is represented for the color difference in the vertical direction, and the value of the variance reaches a minimum where the color difference is smallest. But the value decreases if the overlapped area was large or the shape was similar with other small areas. Therefore, average variance of RGB values from pixels in the eight directions was used to compare large areas. The basic characteristics of this new method was examined in terms of focal length, distance for taking photographs, a range of calculation on the average variance, disparity interval, and on object width, by using red circle plates. The results showed that when a calculation range of average variance was made regarding the total width of an object, a composite color image and a depth image were suitably obtained under a variety conditions. Moreover, an experiment to verify the method on objects resembling red apples was carried out in an orchard during harvesting season. The results showed that the distance error was in the range of 4% at a distance of 2.2m, and confirmed that the distance of ripe apples can be measured using this new method. (3) Analysis of and solution to the correspondence problem Images which create the correspondence problem or similar image features such as a row of similar fruits, overlapped fruits, and an occluded or transformed image hidden by leaves and branches are frequently found in fruit images taken in apple orchards. The solution to or a measure against the problem is to apply the method of binocular stereo vision to such images. The principle of measurement in this case is to compose a central image of left and right images at each cross section by disparity, and to detect a clear image in it. Then, the occurrence of the correspondence problem will be restricted because correspondence error is reduced. However, the type of images produced by conventional TV cameras is different from the principle mentioned above, and takes a whole space as an image. Therefore, clear but false images appear frequently owing to error of correspondence in images affected by the correspondence problem, even if the principle of this study is applied. It is necessary to establish a suitable correspondence of cross sections by disparity, and to take corrective measures for restricting the appearance of false images. As a typical image produced by the correspondence problem in this study, i.e., the appearance of false images and their characteristics, were examined in the case of two circular plates which were on the same epipolar line and had the same color and shape. In this case, a method of composing the same half side of visual fields was attempted as a measure to make the cross sections by disparity correspond and to restrict a false image. In this method, the left side of an image is constructed from a left half side of the leit image and from the left half side of the right image. Another side of the image is constructed from the right half side of the left image and from the right half side of the right image. This form is analogous to the intersection of optic nerves, an optic chiasma, in the human visual system. The results verified that the relative position between an object and a viewer and that the correspondence of cross sections of left and right images by disparity was also suitable. The composition of the same side of visual fields restricted the appearance of false images, because the average of variant of RGB values in false images was larger than that in true images. Namely, the false images appeared at the edge of common fields on the same side of visual fields or outside them. Experiments on a row of four circle plates and an image of fruits hidden by leaves were performed. The results showed that the method mentioned above was effective in creating suitable correspondence of cross sections by disparity, and in restricting false images. Moreover, this method was applied to 1 6 pairs of images in a single row and 12 pairs of images of overlapping fruit which were taken in apple orchard. The results showed that suitable correspondence of cross sections of disparity were 90% in the images placed in a row and 80% in the images of overlapping fruits at a level within two disparity intervals, while the error of distance measurement was about 5% in relation to the composition on the same side of the visual fields. (4) Influence of the conditions of input images on measurement accuracy A special characteristic of the present measurement method of stereo vision lies in its ability to simultaneously obtain the image and depth of a color composite. Therefore, it is expected that accurate 3-D information regarding fruit and its surroundings is obtainable. However, images taken outdoors by a video camera are influenced by factors such as natural light and photograph. Thus, it is necessary to clarify the influence of these factors on the measurement accuracy of the method described in this study. The factors contributing to measurement errors were considered. Thereafter, the relations between the illuminant, fruits, and the photograph were investigated, and their on measurement accuracy were analyzed on the images taken under various conditions in apple orchards. From the results, the brightness of the fruit images was proportional to the illumination of the fruit surface depending on automatic function of exposure except in backlight conditions, and was more than 90 (35%), even if the illumination of the fruit was less than one klx. Under the backlight condition, when the sky was lighter and clearer, the brightness of fruit decreased more rapidly, and was less than 90, though the illumination of fruit was from 6 to 10 klx. The average errors of distance measurement ranged from -3 to O% under conditions of direct and scattered sunlight. In characterizing the average variation of RGB values, the type of change of V-U types accounted for 75%, while the N type accounted for 15 to 20%. The minimum level of most types was that from 200 to 1000, and was 70%. The type showing the greatest] difference between measured distance and practical distance was the one with a - 2 to 1 on disparity interval, and its rate was more than 90 deg C From the results of all images (a total of 186 paris of images; 144 paris of 'Fuji' and 42 paris of 'Orin', a range of application of the trial system on red and yellow-green apples was that the brightness was more than 90, and the rate of overlap was within 0.5, and the width of fruit images was more than 16 pixels on 'Fuji' and 26 pixels on 'Orin'. In the above-mentioned range of application, the errors of distance measurement were -4 to 2% within a distance of 1.2 to 3.5 m, and -7 to O% for a distance of over 3.5m. (5) Application of stereo vision to measurement of 3-D shape and position] control It was also considered how the method used in this study can be applied for measurement of 3-D shape of an individual fruit grasped by a picking hand, and how to use 3-D visual infonnation obtained by the stereo vision system for position control of a manipulator utilizing a picking hand. In regard to the measurement of a 3-D shape, the results of experirnents on a Rubic's-cube and a red apple showed that color composite images and depth images were satisfactory when the distribution of RGB values from low to high was wide, showing a depth error of within 5 %. However, significant error occurred if there were similar RGB densities on the left and right sides of an image. These results showed that the method used in this study is applicable for measuring 3-D shapes when the gradation of brightness or color density is large. The criteria for calculating the range of the average RGB variant and clarity index should be improved for measurements of false correspondence. In general, perception of the shape of an individual fruit or branch is obtained by examining the shape's contour, and the time required for image processing is relatively long. For high speed processing, a method that uses pattems on a small segment of line drawn digitally was considered. The drawn line form was represented by a pattern unit of 3 by 3 pixels. When 162 line element patterns and 110 angle cell patterns were defined, the precision involved in detecting a shape element was a line of about 5 degrees in inclination in over 20 pixels, and was 5 degrees in the tangent angle of an arc whose radius was more than 30 pixels, and was 10% in the radius of curvature with a central angle of over 80 degrees. The results on images of fruit showed approximate values regarding the angles of the contours. It is important to reduce noise in images and to perform efficient binary processing in order for this method to be used in a practical Setting. The control of a manipulator to allow a picking hand to approach an individual fruit requires the inverse calculation of the manipulator's joints at the action angles. Yet such a calculation might delay real time control. It was then considered by numerical simulation that an algorithm did not require an inverse calculation, but used both visual coordinates obtained by stereo vision and basic patterns of action at given points. In the practical use of the algorithm, it is necessary to decide given points, their interval, and the number based on the characteristics of action patterns among the given points in order to hold the approach a level of precision within the tolerance level. (6) Conclusion It was verified using hardware that a method of binocular stereo vision had sufficient ability to measure distance for the purpose of obtaining 3-D information for the mechanical harvesting of fruits. For specifying this method's performance, it is important to determine the factors and the relationships of distance error of both the left and right cameras, to detect and revise the cross angle of the optic lines between the left and right cameras, and to ensure that each pixel corresponds between left and right images. The difference between the principle of binocular stereo vision utilized in this study and the conventional method is that the range of space searched for correspondence is restricted to cross sections of a distance by disparity depending upon a line of sight. Namely, constraint of space on a common visual field of the left and right cameras was considered in addition to epipolar constraint. Therefore, distance can be initially calculated even if the correspondence of the left and right images is not achieved completely. However, in the method of image acquisition using conventional TV cameras, it is impossible to take and record images of the same cross section by left and right cameras simultaneously. Thus, distance errors might occur because differences between cross sections showing disparity between the left and right images creates false images during processing. Moreover, the conditions of gaze distance and disparity interval have an influence on the processing time and appearance of false images. From the standpoint of this study, the correspondence problem of binocular stereo vision is encountered when the difference between cross sections showing disparity between the left and right images occurs frequently. Therefore, find a solution to the problem will be difficult using conventional TV cameras. However, it was possible to judge the correspondence of cross sections showing disparity between the left and right images by comparing the amount of specific color at the same side of left and right images. Moreover, composition using the same side of the images made it possible to measure distance without the appearance of false images. This method will be available as a next step in arriving at a solution to the correspondence problem. The relations between the distribution of the specific color of an object and the processing range should be considered, because the color distribution influences the amount of specific color on the same side of left and right images. The accuracy of this method for measuring distance in an apple orchard was sufficient for the purpose of harvesting fruit, if the conditions of input images were within the application range mentioned above. A specific function is required for the camera to detect and revise the influence of backlight because this method cannot process images in strong backlight. Lastly, this study introduced and applied a new principle using binocular stereo vision for use in automatic fruit harvesting in an apple orchard, and proposed guidelines and basic materials to develop a stereo vision system. The areas remaining to be addressed in regard to the system's practical application include the following : improvement of a clarity index of clearness and the criterion to determine line of sight distance and disparity interval on the software side, a multi-processor system for high speed processing, the improvement of image resolution, and simplification of the camera control mechanism on the hardware side.
Показать больше [+] Меньше [-]