What is it?
Fundamentally, photogrammetry is about measurement: the measuring of the imaging subject. To perform high-quality photogrammetric measurement, the photographer capturing the photogrammetry data set must follow a rule-based procedure. This procedure will guide users regarding how to configure, position, and orient the camera towards the imaging subject in a way that provides the most useful information to the processing software and minimizes the uncertainty in the resulting measurements. These measurements will be as good or as poor as the design of the measurement structure, or lack thereof, that underlies the collection of the photographic data.
Recent technological advances in digital cameras, computer processors, and computational techniques, such as sub-pixel image matching, make photogrammetry a portable and powerful technique. It yields extremely dense and precise 3D surface data with an appropriately limited number of photos, captured with standard digital photography equipment, in a relatively short period of time. In the last five years, the variety and power of photogrammetry and related processes have increased dramatically.
Video: “Photogrammetry for Rock Art”
Watch this brief video to see an example of a petroglyph rock art panel as a 3D model created using photogrammetry.
How does it work?
CHI uses an image capture technique for photogrammetry based on the work of Neffra Matthews and Tommy Noble at the US Bureau of Land Management (BLM). The BLM Tech Note (PDF) and 2010 VAST tutorial, provide additional information regarding the origins of our methods. Neffra and Tommy have been improving their photogrammetry methods at the BLM for over 20 years. Their image capture method acquires photo data sets that are software independent and will get the most information-rich results possible from the various photogrammetry software systems on the market. CHI has been working in collaboration with Tommy and Neffra for over a decade. The four-day photogrammetry training CHI offers was developed by and continues to feature this collaboration.
The method of image capture taught by CHI is software independent. A well-captured photogrammetry data set will produce the same 3D model when processed by a knowledgeable user employing sufficiently robust software. Currently CHI uses Agisoft PhotoScan Pro software.
The most advanced photogrammetry software uses the Structure from Motion (SfM) method. The SfM approach simultaneously determines how light passes through the camera’s optical system (the camera’s calibration) and the camera’s position and orientation (pose), relative to the imaging subject, for each photo. During processing, each camera’s calibration and pose is made increasingly more precise through an iterative process. This is done by iteratively refining a sparse cloud of points in the virtual scene representing the real-world environment containing the imaging subject. The points in the sparse cloud are created from the matches of similar pixel neighborhoods identified in multiple photos. If matching pixel neighborhoods are found in two, or preferably more, photos, the areas occupied by the pixel neighborhoods in the respective photos are projected into the virtual 3D scene. These projections intersect in the form of a common volume in the 3D scene and are represented as a point in the sparse cloud. The positional uncertainty (precision) of these points is reduced in a process discussed in more detail below. As the precisions of the point positions increase, the precisions of the camera calibration and pose also increase. When the desired camera calibration and pose are at the level of precision acceptable to the user, the SfM process is finished. During the following stage, PhotoScan and other software packages offering SfM then use one variety or another of multi-viewpoint stereo algorithms to build a dense point cloud, which can be transformed into a textured 3D model.
Using SfM algorithms, photographic capture sets can be acquired using uncalibrated camera/lens combinations. To generate the information necessary to characterize how light passes from the imaging subject through the given optical system, SfM algorithms need a set of matched point correspondences. These matched points are found in the overlapping photographs of a planned network of images, captured from different positions and orientations relative to the imaging subject. How the camera is moved relative to the subject has a great impact on the degree of precision (positional uncertainty) present in the measurements of the associated 3D representation.
SfM differs from previous photogrammetry software tools. SfM relies solely on the photographs of a camera moving around the scene containing the imaging subject. No separate camera calibration is needed or desired. This feature separates SfM from other older photogrammetry algorithms, which require either a precalibrated camera or an additional set of photos to calculate a calibration for the camera, before point neighborhood matching commences.
To explain this in greater detail, the SfM software must take the information contained in the set of photogrammetry photos and optimally solve for three outcomes:
- Calibrate the camera’s interior geometry describing how bundles of light rays travel from the imaging subject through the camera’s optics to the digital sensor
- Determine the relative position and orientation of the camera pose for each photo relative to the imaging subject
- Generate a sparse point cloud of 3D points from finding and matching locations in two or more photographs that depict the same feature on the imaging subject
In SfM, error reductions in the camera calibration, pose, and 3D point matches are all solved simultaneously. A precision improvement in any one of these three components, calibration, pose, or sparse points in the cloud, will improve the precision of the other two. A complex algorithm called a Bundle Adjustment generates this three-part improvement. How the Bundle Adjustment works is beyond the scope of this photogrammetry introduction; however, it is useful to know that Bundle Adjustment algorithms are widely used in experimental science.
In SfM, the camera calibration and pose is continually improved throughout what is called the optimization operation, as the matched point’s positional uncertainty is systematically reduced within the sparse point cloud. This is usually done by iteration, at each stage removing the points in the sparse cloud that have the poorest precision. Each time the points with the poorest precision are removed, a Bundle Adjustment is run, and the calibration, pose, and point precisions improve. Points with initially poor precision, if not first selected for deletion, can have their positional precision continuously improved over iterations of the Bundle Adjustment. This is one reason why not all the poor precision points are deleted at once.
When these three operations have yielded a very high precision, low uncertainty camera calibration and pose, often expressed in small fractions of pixels, the role of the SfM algorithm is finished. The sparse cloud has no further use. The remaining precisional uncertainty of the SfM solution is quantified in the form of a Root Mean Squares Error (RMSE) residual by the Bundle Adjustment. RMSE is equivalent to the statistical concept of a standard deviation. This level of precision uncertainty will serve as a foundation for all subsequent measurement operations.
The photogrammetry software must then use Multi-Viewpoint Stereo (MVS) algorithms, informed by the knowledge of camera calibration and pose, to build a dense point cloud in virtual space, of a size determined by the user. The size of the dense cloud can reach into the hundreds of millions or billions of points. With a high precision camera calibration and camera pose, the camera sensor that captured each photograph can be positioned and oriented in virtual space to project the photo's pixel information through the virtual model of the lens (the calibration) in a direct line out towards the point on the virtual subject’s surface that the pixel represents. It is important to understand that each of these projections is, in fact, a small, gradually widening “tube” from a pixel on the camera sensor to a spot on the subject. This tube encloses a small volume. This volume is the “footprint“ the projected pixel covers on the surface of the subject. When the projections from multiple photos intersect on the subject’s surface, they create a commonly shared volume. When the photos are captured from rule-based positions and orientations (poses), their projections work together to make a smaller and smaller commonly shared volume. The surface point in the dense point cloud made by these intersections falls within this commonly shared volume. The smaller the common volume, the less uncertain point’s location becomes. This also means that the point’s position in space is known with increasingly higher precision. The rule-based photogrammetric capture method designed by Matthews and Noble is explicitly designed to produce a set of viewpoints of the subject that will produce projection intersections with the smallest common volume in 3D space. As will be shown below, when nine projections from nine properly positioned and oriented photographs intersect, the common volume will be very small and a highly precise, low positional uncertainty point will result. When each point results from the intersection of nine well located pixel projections, the dense cloud of points will represent a precise, measurable virtual 3D version of the original imaging subject’s surface shape.
The photogrammetry software then employs surfacing algorithms, using the dense cloud’s 3D point positions and the look angles from the photos to the matched points, to build the geometrical mesh. A texture map is calculated from the color information in the pixels of the original photos and the knowledge of how those pixels map onto the 3D geometry. The result is a textured 3D model that can be measured with a known precision.
Example: Tlingit Helmet – Views of a 3D Photogrammetric Model
On its own, photogrammetry generates 3D representations without scale. The scale for the virtual representation is added during the SfM stage of processing. The scale provides the ability to introduce real-world measurement values to the virtual 3D model. At CHI, we accomplish this by adding at least three (and preferably four) calibrated scale bars of known dimension into the scene containing the imaging subject. The scale bars can be on, around, or next to the region of interest. Each scale bar must be included in multiple (at least three, preferably nine) overlapping images. Scale bars are flat, lightweight linear bars in several sizes with printed targets separated by a known, calibrated distance. The software can recognize the targets. The user then enters distances between the targets. Using calibrated scale bars can produce levels of measurement precision well below one tenth of a millimeter.
Measurement structure design is the process of defining a sensor network and the subsequent methods to process the information it collects. In photogrammetry, the sensor network is the camera’s 3D location and orientation for each photo in the capture set in relation to the imaging subject. To get the best results, this network must collect enough data so that the impact of any incorrect data is minimized. This data set must also enable the reduction to a minimum of the 3D measurement uncertainty of the resulting virtual 3D model representing the imaging subject. The design of the measurement structure is influenced by the imaging subject’s 3D features, any restrictions on the placement of cameras and the number of images necessary to satisfy the given “accuracy” and quality requirements. The prerequisite for any successful measurement in any scientific data domain is the design of such a measurement structure. Reduction of measurement uncertainty is accomplished through the systematic reduction and elimination of error in photogrammetric image capture and its subsequent virtual 3D reconstruction.
The resolution of a surface model is governed by the area on the real-world subject represented by the pixels in the images from which it was generated. This resolution is known as ground sample distance (GSD). The GSD resolution is determined by the resolution of the camera sensor, the focal length of the lens, and the distance from the subject.
How to Capture Photos
A crucial element of a successful photogrammetric process is obtaining a “good” photographic sequence. Good photographic sequences are based on a few simple rules. The CHI photogrammetry training class explores the reasons behind these rules and shows how to make informed choices in the face of challenging subjects.
Here are some suggestions for the camera/lens configuration:
- Begin your project using a wide-angle lens. CHI’s first lens selection usually has a 24mm focal length.
- Choose your desired distance from the subject and focus. Then tape the focus ring in place.
- Use prime lenses rather than zoom lenses. If a zoom lens must be used, use the nearest or farthest extent of the zoom.
- The camera’s aperture must remain constant during the capture sequence. On a 35mm camera, it is good practice not to set the aperture smaller than f/11. With apertures smaller than f/11, diffraction effects occur that blur the image, significantly reducing the camera’s resolution.
- Use the lowest possible ISO setting. The higher the ISO setting, the more electronic noise is generated in the camera sensor. This noise makes the matching of pixels in different photographs more difficult.
- Turn off image stabilization and auto-rotate camera functions.
- In variable light conditions (a partly cloudy day, for instance), the camera should be set to aperture priority mode (preferably f/5.6–f/11 to get the sharpest images). This locks the aperture and evens out exposure by varying the shutter speed.
- To obtain the highest precision results, ensure that the camera configuration does not change for a given sequence of photos.
- If a change of camera or lens configuration, including focus, is necessary, group the subsequent photos together in a different set from the previous photos. Calibrate the sets of photos separately, each in its own calibration group.
How to determine where to take the photographs:
- To maintain a consistent 66% overlap, the camera must be moved a distance equivalent to 34% of the camera’s field of view between photographs, from left to right.
- Be sure to begin the first row of photos positioned such that two-thirds of the field of view is to the left of the imaging subject.
- Ensure the entire subject is covered by at least three frames in each row.
- Proceed systematically from left to right along the length of the subject and take as many photos as necessary to ensure complete coverage. The last photo of the row must have two-thirds of the field of view to the right of the subject.
- For higher quality results and greater imaging redundancy, follow the procedure below. This procedure helps lower point matching and depth uncertainty and provides essential redundancy:
- Raise the camera vertically and aim the camera downward 15 degrees to re-photograph the previously captured row.
- At the same time, rotate the camera 90 degrees to portrait mode and use the same 66% overlap from left to right.
- When the second row is finished, lower the camera vertically below the first row and aim the camera upward 15 degrees to re-photograph the captured area.
- Rotate the camera 180 degrees (for a total of 270 degrees), and again capture the area in the same way.
- It is important to maintain a sufficiently consistent distance from the subject to retain sharp focus. Use a depth of field calculator app on a cell phone to understand how much freedom of movement in depth from the subject you have for your given camera and lens configuration.
- For multi-resolution applications or to increase or decrease resolution, the user can change the camera position (closer or farther away from the subject) or change the focal length of the lens (such as 24mm to 50mm) up to a factor of twice or one-half the resolution of the previous set of photos.
- Follow this rule for as many sets of photos as necessary to reach the desired resolution.
- Calibrate each set of photos separately if you change the focus or the lens.
- Because of the flexibility of this technique, it is possible to obtain high accuracy 3D data in subjects that are in almost any orientation (horizontal, vertical, above, or below) the camera position.
- For round subjects, capture photos every 10 to 15 degrees and overlap the beginning and end photos to complete the circuit. Repeat the previous procedure to capture three rows of properly positioned photographs.
Archiving the Results
Photogrammetry is archive friendly. Strictly speaking, all of the 3D information required to build a scaled, virtual, textured 3D representation is contained in the 2D photos present in a well-designed photogrammetric capture set. Today, the methods of long-term preservation of photographs are well understood. To preserve the textured 3D information of any imaging subject, all that is necessary is to archive the sets of photos and their associated metadata. When a 3D representation is desired, the archived photo sets can be used to generate or re-generate the virtual model. With a well-captured image set, newly generated 3D representation will be same as previous representations made with the image set. At the current rate of software and computing power development, it is likely that 3D models built from archived photogrammetry image sets will at some point be available “on demand.”
Example: Cuneiform Cone Sequence
The image sequence below shows a 3D model of a section (17mm X 24mm) of a cuneiform cone from the Archaeological Research Collection of the University of Southern California. The sequence is a series of increasing closeups. Each of the images shows the 3D mesh (the underlying geometry) in the upper right, with texture applied in the lower left.