Despite great advances in computing, we are still far away from the world dreamed by Asimov in 1942, where he sketched a world dominated by robots. However, robotics is a fast evolving area of scientific study, and sophisticated robots are being developed. Even LEGO advertises, "MindStorms Robotics Invention System lets you design and program real robots that do what you want them to."
Currently, the field of robotics focuses on the creation of autonomous robots. Autonomous robots are programmed to achieve a specific objective, which requires processing a certain set of tasks with varying knowledge and skill levels [8]. Using a high level specification, the goal is to formulate a sequence of intelligent movements. The execution of these procedures requires parallel processing techniques to manage the heavy computational load [2]. In spite of the significant advances in processing speed, sequential processors are far from providing sufficient computing capacity for advanced robotic planning systems. Modern VLSI technologies provide a unique opportunity to narrow this gap through the use of parallel computation [3].
Many robots must be able to perform path planning [8], which involves searching for an optimal route from an initial location to a desired one, avoiding collisions. Several parallel motion planning algorithms have been proposed to solve it [6].
This article discusses how parallelism benefits robotics. First we present the problem of understanding an obstacle laden environment. Next we present an evaluation procedure for determining feasible configurations and positions with an analysis of opportunities for parallel processing benefits. Next, a case study is presented to explain the parallel implementation we developed. Finally, the main results and conclusions are presented.
Curto and Moreno [4] propose a mathematical formalism for the calculation of C-obstacles (an obstacle representation in the C-Space) through the evaluation of a convolution of two functions; one describes the robot, A, and the other describes the obstacles, B. The Convolution Theorem states that Fourier transforms can be used to perform the convolution. A general expression, valid for any kind of robotic structure, is shown in Figure 1 (being CB the function that describes the C-obstacles).
In the next section, this general expression is analyzed for opportunities to introduce parallelism.
The three levels previously stated are valid for any robotic structure, and different solutions may take advantage of one, two or three levels of parallelism at a time, resulting in faster C-space evaluation algorithms. The design of different approaches that take advantage of one, two or three levels of parallelism are described next. A traditional Master-Slave approach is proposed for each solution.
In Figure 2, the FFT level is exploited. The master asks the slaves to perform the computation of a Fourier transform, and they proceed in parallel with the well known FFT algorithm.
Figure 3 shows the configuration variable level, where again there is a master and n slave processes. The master asks the slaves to evaluate a slice of the C-space, that is, the convolution product for one value of the configuration variable for each slave. Then the partial results are gathered and the final C-space is built.
In another proposed solution, the master gathers the convolutions products for each spatial variable from the slaves.
So far, only one level of parallelism has been exploited, but a solution can be obtained where two, or even three levels are taken into account. These parallel solutions are designed to work in environments with enough processors to allocate a significant number of processes. Then, we can have FFT-Configuration, FFT-Spatial, Configuration-Spatial and FFT-Configuration-Spatial solutions. In Figure 4, a more complex model is illustrated where a three level solution is exploited. Submaster processes organize the communications of partial results, that are then communicated to the master. This method can be seen as a single level nested solution.
It should be noted that these solutions do not address issues associated with validating the models. Such issues could be associated with the existence of bottlenecks, memory availability, and synchronization problems, to name a few. The reader only needs to imagine how difficult it would be to reach optimal performance in the case of the model presented in Figure 4, where the synchronization of multiple processes of different types (master, submasters, sub-submasters, slaves) is needed, and how easy it would be for one to enter a long wait state that would slow down the overall performance. The details of these issues are left to a more in depth article.
The robot has a planar movement so its position is defined by three configuration variables (xr, yr, thetar) , as described in [4], the expression to evaluate the C-space is shown in Figure 5.
Three possibilities exist that appear to take advantage of inherent parallelism. First, the algorithm consists of iteratively performing a series of basic operations one for each of the possible robot orientations, thetar, (i.e., the configuration variable level). Second, each orientation needs to accumulate the convolution products for each plane z (i.e., the spatial variable level). These calculations are independent, therefore they can be done in parallel. These intermediate results are integrated to form a final result. Clearly, this is an instance of the data parallelism model [1].
Next, the operations that carry out the two types of processes are described.
In order to work with the FFT algorithm employed for the needed evaluation of 2D Fourier transforms, three discretizations of the 3D workspace have been used: 64 x 64 x 64, 128 x 128 x 128 and 256 x 256 x 256 (All of them are powers of two as needed by the FFT). We chose to use the FFTW implementation of the FFT algorithm, developed at MIT, which benchmarked well and permits the generatation of parallel code.
In Figure 6a, a robot faced with three obstacles is shown; it is clear that the robot has a reduced set of options to circumnavigate. The corresponding C-space is shown in Figure 6b, where two small sets of free collision configurations appear inside a big C-obstacle.
Based on the experience presented in [10], the best
results are obtained when one processor doubles as both master and slave.
Figure 6.c
illustrates the master as mostly idle (red)
while the four slaves are mostly working (green). The results, in the
following, have been obtained with this setup.
In Table 1, Table 2, and Table 3, show calculation times for the execution of the fine grain parallel algorithm and the sequential algorithm at different resolutions and robot heights. A trend is observed in the speedup as a function of resolution. Most likely the trend comes form communications penalties when the time dedicated to pure calculation is small, as in the case of a resolution of 64x64 in the bitmaps. Table 3 shows a smaller performance gain because of memory limitations.
| Sequential | Parallel | Robot height | Speedup |
| 0.99 | 0.73 | 5 | 1.36 |
| 3.34 | 1.23 | 20 | 2.72 |
| 7.30 | 2.40 | 45 | 3.04 |
| 10.51 | 3.21 | 64 | 3.27 |
| Sequential | Parallel | Robot height | Speedup |
| 14.30 | 5.22 | 10 | 2.74 |
| 52.81 | 15.21 | 40 | 3.47 |
| 85.05 | 23.45 | 64 | 3.63 |
| 164.06 | 44.34 | 128 | 3.70 |
| Sequential | Parallel | Robot height | Speedup |
| 252.45 | 76.67 | 15 | 3.29 |
| 376.54 | 107.51 | 30 | 3.50 |
| 792.70 | 226.77 | 64 | 3.50 |
As Figure 6c and resulting speedups reflect, the performance of the applications is optimum: slave processes are working most of the time (green) and the master process is waiting (red) for the partial results of the assigned tasks and receiving the results (yellow).
As a practical application, a three dimensional mobile robot is considered. Limitations of test environment prompted development and implmentation of a solution optimizing at the configuration variable level. Different experiments have been carried out in which the resulting speedups are very acceptable. The presented case study provides reasonable results that could be extrapolated to more complex robotic structures. More complex structures can hopefully be designed on the lines of the parallel algorithm studied in this paper.
Convolution Theorem - There are two ways of expressing the convolution theorem: 1) The Fourier transform of a convolution is the product of the Fourier transforms. 2) The Fourier transform of a product is the convolution of the Fourier transforms. Convolutions can be very difficult to calculate directly, but are often much easier to calculate using Fourier transforms and multiplication.
DFT (Discrete Fourier Transform) - DFT is the numerical algorithm to compute Fourier Transform. Because a digital computer works only with discrete data, numerical computation of the Fourier transform of f(t) requires discrete sample values of f(t). In addition, a computer can compute the transform F(s) only at discrete values of s.
FFT (Fast Fourier Transform) - The Fast Fourier Transform is a DFT algorithm developed by Tukey and Cooley in 1965 which reduces the number of computations from N2 to N log N. There are basically two types of Tukey-Cooley FFT algorithms in use: decimation-in-time and decimation-in-frequency. The algorithm is simplified if N is chosen to be a power of 2, but it is not a requirement.
FFTW (Fastest Fourier Transform in the West) - FFTW is a C subroutine library for computing the Discrete Fourier Transform (DFT) in one or more dimensions, of both real and complex data, and of arbitrary input size. FFTW is typically faster than other publically-available FFT implementations, and is even competitive with vendor-tuned libraries. The FFTW package was developed at MIT by Matteo Frigo and Steven G. Johnson. http://www.fftw.org
Fourier Transform - The Fourier transform, in essence, decomposes or separates a waveform or function into sinusoids of different frequency which sum to the original waveform.
Load balance - In the parallel implementation of any algorithm the performance depends critically on balancing the load (including workload-related and design-related factors) between the parallel processors. The idea is to keeping all processors busy for most of the time. Load balance describes how well tasks (basic units of work assigned to a processor) are distributed across different processors. Surprisingly, there is no commonly agreed upon definition of load balance in the literature.
Master-slave paradigm - A central master task is given responsibility for problem allocation. Each slave repeatedly requests and executes a problem from the master. Slaves can also send new tasks to the manager for allocation to other slaves. The efficiency of this strategy depends on the number of slaves and the relative costs of obtaining and executing problems.
MPI - The Message Passing Interface Standard. A multilateral gathering of parallel computing users, vendors and researchers has specified a complete interface for message-based interprocess communication. The MPI standard means true portability for parallel programs. It defines the necessary infrastructure for third party software products and will enable a wider proliferation of parallel technology. The specific semantics of the communication interface, its perceived popular and future prospects, need no longer be a factor in choosing an environment or a vendor for developing parallel applications. Users and buyers can focus on other factors such as efficiency of implementation and related development tools.
Speedup - Speedup is the ratio of the serial of the best sequential application for solving a problem, to the time taken by a parallel algorithm to solve the same problem on n processors.
VLSI (Very Large Scale Integration) - the process of placing
thousands (or hundreds of thousands) of electronic components on a single
chip. Nearly all modern chips employ VLSI architectures, or ULSI (ultra
large scale integration). The line between VLSI and ULSI is vague..
This study was partially supported by the Junta de Castilla y Leon trough SA02-00F research project. The authors would like to thank all who made comments and suggestions regarding this paper.
Roberto Theron is a Ph.D. candidate in Computer Science at University of Salamanca, Spain. His research interests include Parallel programming and Robotics. In 1999 he began to teach computing in the Department of Computers and Automation of the University of Salamanca.
Francisco Javier Blanco is a Ph.D. candidate in Computer Science at University of Salamanca, Spain. His research interests include Robotics, particularly in path planning area. In 1999 he began to teach computing in the Department of Computers and Automation of the University of Salamanca.
Dr. Belen Curto, lecturer professor at the University of Salamanca with main research interest in Robotics.
Dr. Vidal Moreno, associate professor at the University of Salamanca with main research interest in Robotics.
Dr. Francisco José Garcia, lecturer professor at the University of Salamanca with main research interest in Software Engineering.
Want more Crossroads articles about Parallel Computing? Get a listing or go to the next one or the previous one.
Last Modified:
Location: www.acm.org/crossroads/xrds8-3/prm.html