"Before choosing project performance measures, consider how performance will be estimated."
After constructing influence diagrams, the next step is to specify performance measures, also called attributes. The performance measures are the metrics to be used to quantify performance relative to objectives. Figure 19 shows where this step fits into my 12-step model-construction process.
Figure 19: The step focused on specifying performance measures.
A Performance Measure is Needed for Each of the Lowest-Level Objectives in the Objectives Hierarchy
A performance measure is needed for each of the lowest-level objectives in the objectives hierarchy . You don't need performance measures for intermediate or higher-level objectives because the rule for decomposing objectives is to define enough sub-objectives to account for every component or dimension of the higher-level objective. If the objectives hierarchy is complete, measuring performance relative to the lowest-level objectives will capture all attributes needed to quantify the degree to which every intermediate objective is achieved, as well as the degree of achievement for the top-most objective in the hierarchy.
If, when specifying performance measures, it becomes apparent that some significant, component of a higher-level objective does not appear in the objectives hierarchy, then you should return to the step of constructing the hierarchy and add the sub-objectives needed to capture the missing component.
Consider How Performance Measures Will Be Used
In order to select desirable performance measures, it is useful to consider the purposes of performance measures :
Consider How Performance Will Be Estimated
As for the question of how performance will be estimated, most authors identify two alternatives . One option is to obtain performance estimates directly from "experts"—individuals, selected from the organization based on their understanding of projects and the likely outcomes of those projects. With this approach, estimates of performance relative to each objective are made by the expert based on judgment, aided, perhaps, by influence diagrams that document the factors and influences that the expert should consider when estimating performance . Experience shows that so long as performance measures match the way the expert thinks about performance, such estimates can usually be obtained quickly and easily .
The other approach for obtaining estimates of performance is to construct a model that simulates the consequences of projects and produces as output measures indicating the degree to which each objective is achieved (see the next page). With this approach, the influence diagrams provide a blueprint for constructing a quantitative consequence model—the bubbles suggest model variables and the arrows identify relationships that the consequence model must quantify . With the modeling approach, you may not need to select measures most familiar to your experts, but you'll still need performance measures that are comprehensible to those who need to understand the basis for project decisions.
A Model is Almost Always Needed to Simulate the Consequences of Projects Over Time
In reality, there aren't just two approaches for measuring project performance, there is a continuum of possibilities corresponding to the different types of models and levels of modeling sophistication that can be employed. As mentioned previously, many projects produce assets or otherwise create benefits that persist over some extended period of time. The value of such a project depends on how the changes in performance resulting from the project vary over time.
It is usually not very efficient to ask an expert to provide an estimate of project performance at each of a large number of points in time. Instead, time-varying performance is approximated by asking the expert to estimate the time when project impacts would begin, when they would end, and to provide a few additional estimates that, along with some assumptions, allow a curve representing time-varying project performance to be generated. So, even in the simplest case, a model is typically used to simulate project performance. The question really isn't whether or not to construct a consequence model, the question is how sophisticated that model should be. Like all questions of analysis, the answer boils down to a judgment about the merits of decomposition. In every case, however, the measures used to quantify project performance need to be comprehensible to decision makers and to others who want to understand the reasoning behind project choices.
Types of Performance Measures
Different types of performance measures are available for quantifying the degree to which conducting a project contributes to the achievement of an objective. In addition to the choice of measuring the absolute level of organizational performance versus the increment in performance due to a project, you can choose from three types of performance measures: natural measures, proxy measures, and constructed scales . Each of these types of performance measures is widely used in practice, and it is common for a priority system to use one type of measure for some objectives and other types for other objectives
Natural Performance Measures
A natural measure directly measures the degree of achievement for an objective. For example, if "minimize costs" is an objective, the natural measure would be "cost, expressed in dollars" (or whatever the unit of local currency happens to be). If an objective for a system for prioritizing transportation projects is "reduce the number of accidents," a natural measure would be "number of accidents." If "increase sales" is an objective, a natural measure would be "number of sales."
As demonstrated by these examples, a natural measure has a clear, straightforward relationship with the objective; it is the "natural way" to measure the objective. A natural measure is generally something that is easy to observe, and can be counted or otherwise physically measured. An influence diagram for an objective having a natural measure would, arguably, consist of a single node, the natural measure. Yes, you could probably identify factors that influence the natural measure, but those influencing factors would likely not provide an easier or more precise way to measure the objective. Because a natural measure leaves so little chance for misinterpretation, there must be a strong reason not to choose it as the objective's preferred performance measure. When natural measures are used, technical experts, decision makers, and other stakeholders will have the same understanding of a project's performance relative to the objective.
Proxy Performance Measures
Unfortunately, natural measures do not exist for many objectives, and even when one exists there may be good reasons for not using it. An option in such cases is to choose a proxy measure. Proxy measures, as the name implies, measure something related to the objective, not the objective itself. Most often, a proxy measure will be something believed to cause, or be caused by, what would be a natural measure for the objective. In such cases, the proxy measure may be called an indirect measure . Regardless of whether the variable is called a proxy or an indirect measure, the concept is the same—the measure is something believed to correlate with the degree to which the objective is attained. To document the nature of the presumed relationship between a proxy measure and its objective, the proxy measure should either already appear in the objective's influence diagram (if it is something the influences the degree to which the objective is achieved), or it may be added to the diagram to document the proxy measure's presumed relationship to the achievement of the objective.
To provide a common example of a proxy measure, consider a manufacturing firm that has selected as one of its objectives "maximize the quality of our products." Quality is an imprecise notion, and the organization may not have any direct way to measure product quality. However, an easily measured proxy might be "percentage of sales returned." This measure is neither a direct measure of quality, nor is it something that indirectly determines quality. However, the percent of sales returned is something that likely correlates with project quality. This proxy measure is not ideal because it ignores customers who are disappointed with the product quality but do not return the product. It also ignores customers who return the product simply because they decide it isn't something they want after all. A proxy measure is less revealing than a natural measure, but using a proxy measure may be necessary for objectives for which natural measures either don't exist or are too difficult to calculate .
Another Example of Using a Proxy for a Hard-to-Measure Objective
Another example of the deliberate selection of a proxy measure is provided by the priority system developed for the U.S. Department of Energy (DOE) to rank candidate sites for locating an underground facility for disposing spent nuclear fuel (such a facility is called a "repository") . The priority system had as one of its objectives, "minimize the probability of incidences of cancer due to radioactive releases from the site." This objective was chosen because regulations governing the selection of a repository site explicitly state that such an objective be used . The natural measure for this objective would be "number of cancers caused by the repository." However, predicting the probability of a cancer from a repository constructed at a site would be very difficult since it depends on, among other things, the number of people who in the future would live near the site, something not easily estimated.
To avoid the difficulties of estimating cancers, an indirect measure was chosen for the objective: the amount of radiation estimated to be released by a repository at the site, expressed in becquerel's. The becquerel, symbol Bq, is a measure of an amount of ionizing radioactivity released to the environment as a gas or liquid. The site ultimately chosen for the repository was Yucca Mountain in Nevada. None of the other candidate sites was estimated to produce a lower amount of radioactive releases than that estimated for Yucca Mountain .
By the way, one Bq is a very small amount of radioactivity; it is defined as the activity of a quantity of radioactive material in which one nucleus decays per second. It has been estimated that the 2011 earthquake off the coast of Japan which damaged the Fukushima Nuclear power plant resulted in a total release of between 340 and 800 PBq (a PBq is a Bq with 15 zeros after it) .
Proxy Measures for Complex, Multi-Dimensioned Objectives
Proxy measures are often used to measure achievement of complex, multi-dimensional objectives . The prioritization of alternative sites for the repository provides an example for this use of a proxy. The candidate sites were mostly in remote, undisturbed locations, where, in some cases, historical properties, such as evidence of centuries-old, Native American camping grounds have been found. A concern expressed for such sites is that disturbing historical American Indian properties in or near the footprint of the repository would adversely impact on the traditions of the impacted tribes. No obvious, natural or indirect measure was available for quantifying impacts on the traditions of Native Americans. The proxy variable chosen was the number of cultural and historical properties that would be lost or moved if a repository were to be constructed at the site. The assumption was that constructing a repository at sites with more historical and cultural properties would likely have a greater adverse impact on the preservation of tribal traditions and culture. The measure is a proxy because there is no direct or indirect basis for concluding that the number of historical properties impacted has any bearing on the preservation of Native American culture .
Proxy Measures that Measure Means Objectives
As illustrated by several of the above examples, indirect measures are often used in situations where the degree of achievement relative to a fundamental objective depends on the organization's projects, but also depends on other factors that may be uncertain and have little or nothing to do with the projects that are conducted. Trying to compute the net impact of project and non-project factors may require a lot of work. Accounting for the project's impact on something related to the objective may be much easier and produce an indicator that reflects more directly the consequence of project decisions.
Sometimes, the selected indirect measure for a fundamental objective is a natural measure for a means objective. This was the case for the repository priority system's use of radioactive releases as a proxy for cancers. It was also the case for the previous-page example wherein members of a citizen's advisory group chose to use distance as a proxy for public health risk. Using a proxy measure that is actually measuring a means objective posses the dangers previously expressed for putting means objectives in the objectives hierarchy; namely, a means objective misses other ways of achieving the fundamental objective. Thus, using a repository's estimated radioactive releases as a proxy for cancers ignores site characteristics that influence the likelihood that people would live near and thereby be exposed to the repository's radioactive releases. From this perspective, putting a repository in a location unattractive to people might make more sense than putting it in an attractive living location. In this particular case, because Yucca Mountain is located in a harsh desert environment, had this consideration not been left out of the analysis of alternative repository sites, the results would, if anything, have provided more support to the selection of Yucca Mountain. The point is, it is important to recognize when selected proxies measure means objectives. If so, then you can consider whether the biases created by the simplification strengthen or weaken the case for believing priorities computed by the model.
When to Use Proxy Measures
Proxy measures are most often useful for objectives that are complex, multi-dimensional, and difficult to measure, such as advancing an organization's capabilities, enhancing quality of life for people living in a local community, or minimizing impacts to a habitat for a threatened and endangered species. Oftentimes it is easier to estimate an indirect measure than the natural measure for the objective. For example, if an objective is to minimize the impact on some threatened and endangered species, a direct measure might be the number of each individual animal that survives and reproduces. An indirect measure might be the number of available acres of habitat . The indirect measure might be easier to estimate based on the acreage of habitat that the project impacts. Likewise, health impacts that result from the exposure of people to air pollution, for example, often require translating emissions of air pollutants to ambient concentrations, exposures of people to those ambient concentrations, and dose response functions. It is far easier and more direct to simply estimate the amount of material that is released. For example, although minimizing contribution to global warming might be an objective, an easier to estimate performance measure is the number of tons of greenhouse gases released to the atmosphere. Providing educational opportunities to increase the knowledge and capability of staff may be an objective, but it is easier to measure the number of person hours of training delivered. Improving customer perceptions of the quality of service might be an objective, but counting the number of customer complaints is easier to estimate.
Proxy measures share many qualities of natural measures. They typically utilize of a unit of measurement that is in common use and that can be counted or physically measured. Proxy measures are typically fairly easy to identify, but for a proxy measure to be a reasonably accurate measure for an objective, it needs to correlate well with achievement of that objective. Oftentimes there is little or no data to substantiate that a given proxy measure correlates with the achievement of a complex objective.
The third type of performance measure is a constructed scale. A constructed scale is a scale designed specifically for obtaining direct estimates provided by experts "(scorers") of performance relative to an objective . Typically used for multi-dimensioned objectives, a constructed scale consists of a set of integer numbers (e.g., a 1-to-5 scale) with each number representing a level on the scale. Descriptions are created to define and distinguish the levels. Note that unlike natural and proxy measures, constructed scales are unitless.
As an example, Figure 20 shows a constructed scale used to prioritize projects conducted by a water utility for which one of the objectives was "maintain and improve water quality." The constructed scale was originally developed for measuring drinking water quality .
Figure 20: A constructed scale for water quality.
How to Create a Constructed Scale
The following steps are recommended for creating a constructed scale :
Constructed Scales Typically Require Scaling Functions
Be prepared in a later step of the model-construction process to assign relative values to the levels of constructed scales. Unless specifically designed otherwise, a constructed scale will be non-linear in value. For example, unless you've taken care to make it so, your description of a "10" is not likely to be perceived as exactly twice as good as your description of a "5." Unless the scoring scale has been designed to be linear in value, the model will need to include scaling functions. The scaling function will provide the relative value of the performance specified by the descriptions of the scale levels. Ideally, the scale levels should be set to relate to one another in terms of value in a simple way, such as linear or exponential (more discussion is provided on the page describing methods for assessing single-attribute value functions). Since some of the factors will generally be quantitative, you can typically set the quantitative levels to define the scale increments and then write the qualitative descriptions to reinforce and conform to the level specified by the quantitative factors. Performance relative to the natural measures provided for fundamental objectives are almost always linear in value (e.g., three deaths is three times worse than 1 death). Therefore, scaling functions can generally be designed to assign value based on the relative number of units of the natural measure in the definitions of the scale levels (see below).
Influence Diagrams Provide the Factors for Defining the Levels of Constructed Scales
Influence diagrams make it easy to create constructed scales. The factors in the influence diagram provide the descriptors for writing text to differentiate the levels of the scale . To provide an example, the constructed scale below was derived from an influence diagram developed by a group of stakeholders for use in a prioritization of alternative approaches to cleaning up one of Canada's largest hazardous waste sites . The considerations referenced in the scale include: numbers of and seriousness of illnesses and injuries, whether or not exposures to hazards and, therefore, health consequences could be attributable to poor judgment by those exposed, numbers and types of violations of health and environmental standards, and numbers of fatalities. These are all factors in the influence diagram for this objective constructed by the participants in the prioritization effort.
Figure 21: Scoring scale for health and safety.
Note that the scale levels have been defined such that the numbers of fatalities (highlighted in red) increase by an order of magnitude for every 2-unit increase in the scale level. This relationship was established so that the scaling function for converting a score provided on the scale to the value of that score would be a simple exponential function. I typically use exponentially increasing scales for performance measures that must span multiple orders of magnitude of measurement. The assumed relationship between the order of magnitude increases in numbers of fatalities and an exponential scaling function assumes that (negative) value is linearly proportional to numbers of fatalities.
Defined Impact Scales
A constructed scale of the above type is called a defined impact scale . A defined impact scale combines the multiple factors needed to characterize an outcome into a small number of mutually exclusive descriptions. The descriptions for the various levels of the scale are written so as to eliminate as much ambiguity as possible. However, the definition provided for each scale level corresponds to just one of many possible combinations of factor outcomes that would lead to a similar judgment of seriousness or significance. To help convey the idea that the description provided for each scale level is simply an example, I'll often introduce a defined impact scale with an instruction such as, "Select the score whose definition seems to you most similar in terms of significance to the outcome that you expect from the project."
An often available alternative to a defined impact scale is a pair of scales based on the quantity-intensity structure often seen in influence diagrams. The figure below shows the pair of input assessments derived from the influence diagram for stakeholder relations shown on the previous page. The scales were created for a project prioritization system for a municipal transportation agency. The agency has many stakeholder groups that often have strong opinions about the specific projects that the agency should or should not conduct. The quantity-intensity scales help the agency document stakeholder concerns and account for those concerns when establishing project priorities.
Figure 22: Quantity-intensity scoring scales for stakeholder relations.
Assessing a project's performance relative to stakeholder relations via the above scales requires four sets of inputs. First, the scorer identifies stakeholders known to be in favor of the project and denotes them by checking the checkboxes on left-hand input sheet. Predictions of how those stakeholders would react, depending on whether or not the decision is made to conduct the project, are entered via providing two scores using the scale on the right. After entering inputs for stakeholders in favor of the project, the assessments are repeated for stakeholders known to be opposed to the project. A collective stakeholder weight is obtained by combining weights assigned to the concerned "key" and "other" stakeholders. The combined weight is then multiplied by a scaled value determined by the score on the intensity scale. The resulting stakeholder relations values for the project then become a component of the total value computed for the project.
Quantity-intensity scales work for just about any objective for which the magnitude of achievement depends on impacts of varying levels of significance to something that can be counted. The quantity component defines the scope of impact for the project. The intensity component measures the magnitude of impact experienced by an average item within the scope of impact.
Quantity-intensity scales are often used for environmental objectives, as can be seen by the quantity-intensity decomposition of the sample influence diagram for an environmental objective shown on the previous page. As illustrated in that diagram, influencing factors are divided into those factors indicating the number, importance, and sensitivity of the environmental resources at risk (quantity), and factors indicating the seriousness of the environmental impact to the at-risk resources (intensity). The quantity-intensity approach likewise works for objectives related to worker and public health and safety, socio-economic impacts, and impacts on the skills and capability of employees of the organization.
Mini Value Models
With the intensity-quantity approach, the measure of intensity is multiplied by the measure of quantity to arrive at a weighted index intended to measure the achievement of an objective. In effect, the weighted index is a mini value model constructed for a single objective . As with other value functions, independence conditions are required to justify the multiplicative form used for the weighted index. The necessary independence condition is that the value per unit change of the quantity metric must be proportional to the relative value of the intensity level. Also, the value per unit change in intensity must be directly proportional to the measure of quantity. In most cases, the assumption can be justified based on the fact that projects are unlikely to produce such large changes in the numbers and intensity of impact as to result in non-linearities with respect to value.
Importance of Defining Scale Levels as Precisely As Possible
The description of each level of a constructed scale should be made as clear and precise as possible. The goal is to minimize inconsistencies and inaccuracies in scores due to vagueness in the definition of the levels of the scale. People often assume that scores provided via constructed scales are subjective and vague, whereas natural measures are clearer and more objective . That is certainly the case if the scale is specified as, for example, a 1-to-5 scale with no explanation of what the various numbers mean, but it should not be the case when precise definitions have been provided for each level of the scale.
One of the lessons I try to teach the project portfolio management (PPM) team when pilot testing a priority system is to use the details of what's written in the definitions of the levels of the scale to test whether a scorer truly believes that the chosen score applies to the estimate being made. Over and over I'll ask scorers, "Do you really believe that this project will...[insert the description of scale level selected by the scorer]? Early in the test, such quizzing will cause some scorers to change a project's score to one less favorable to the project. However, when scorers see that they are expected to defend their choice of scores based on what's written on the scales, they soon realize that scores will be taken seriously. Using this questioning technique demonstrates to those who will be providing scores that they should assume that they will be held accountable for the estimates they provide. That doesn't mean that outcomes have to match exactly the descriptions provided by the scales. However, it does mean that at the time a scorer assigns a score based on a constructed scale, the scorer truly believes that the description corresponding with the chosen scale level best matches the scorer's beliefs about the project and what it will accomplish.
Using Constructed Scales to Aid Estimation of Natural and Proxy Measures
One other useful application for constructed scales is worth mentioning. Although the main strength of a constructed scale is providing a way to measure achievement for hard-to-quantify, multi-faceted objectives, constructed scales can sometimes be useful even for objectives having natural measures. Picking a number from a list or scale is easier, for most people, than deciding what number to enter into an empty field on an input sheet of a project selection tool. Also, explanations, helpful hints, and other forms of assistance can be provided in the definitions of scale levels. Providing a constructed scale for a natural measure helps the scorer feel more confident in judgments provided, and may result in judgments that more accurately reflect the scorer's true beliefs.
To illustrate, the likelihood that some event will occur is often a consideration relevant to the prioritization of projects. Likelihood of success, for example, is often a critical consideration for prioritizing risky projects. When prioritizing research and development projects for a pharmaceutical company, for instance, the likelihood that necessary regulatory approvals will be obtained is a key consideration. Likelihood has a natural measure; namely, probability. Despite its natural measure, constructed scales are often created to facilitate the assignment of probabilities . Figure 23 provides an example of a constructed scale for likelihood. The scale illustrates three references that might be used to help a scorer express his or her subjective degree of belief for an uncertain event: a percentage, a visual reference based on a probability wheel, and words.
Figure 23: Scoring scale for estimating likelihood.