Measure out costs

Work measurement tools could cut software development prices by 30 percent

By Murali Chemuturi

Industrial engineers have been measuring various activities of human endeavor in manufacturing and construction for a long time. It is accepted that wherever there is a need for norms or baselines for the pace of working, IEs can do it. IEs specify the standard time for many activities of human endeavor. The industrial engineering fraternity has developed a number of techniques for work measurement, such as time and motion study, synthesis, analytical time estimation and work sampling. These techniques have provided credible results in fields that include service industries, construction industries and manufacturing industries, including mass production, batch production, flow-process production and job order production. The manufacturing industry also includes computer hardware manufacturing, where work measurement by IEs is well-accepted. But when it comes to software development, the scenario is entirely different.

The evolution of the software business

The computer industry is of recent origin, having developed after World War II. While other machines have only a hardware component, computers are driven by two components: hardware and software. Thus, the computer industry includes manufacturing operations for hardware and computer programming operations to develop the software. Early on, the computer programming operations were handled by scientists, and the software component cost was miniscule, typically about 20 percent. Software was supplied with and tightly coupled with the hardware. The software consisted of two components: system software and application software.

When Apple Computers started offering microcomputers for home use, it employed a separate company, Digital Research, to develop software for home computers. With every Apple computer sold, one copy of Control Program for Microcomputers was sold to the customers. This was the major step in hiving off software development as an independent business. Although the microcomputers originally were targeted at homes, development of the word processing program WordStar took Apple computers into offices and journalism. A small database management system, dBaseII, took the microcomputers into other areas of manufacturing. Many word processing programs hit the market from independent software developers. All these heralded the software development into an independent business.

But the real impetus came from IBM’s 1982 introduction of the PC (personal computer), which had much greater computing power than its competitors. The PC stormed the manufacturing industry, moving computing from electronic data processing departments to end users. This unleashed an unprecedented opportunity for independent software developers, firmly decoupling software development from hardware. Hardware manufacturers sold hardware, and users could select the software separately. Independent software developers mushroomed, and organizations that wanted specialized software for their operations had more options.

Whenever there is a vendor-vendee relationship for specialized services, the issue of pricing comes in. Whenever price is to be set for an assignment, cost estimation needs to be carried out. In software development, the major component of pricing is estimating the effort needed from the human resources to execute the assignment. When human effort needs to be estimated, we need productivity.

Software size measurement

Initially software development was priced on a time and material basis. In this pricing system, software developers assigned a set of programmers to work at the customer’s location under their supervision and guidance. The customer pays the developer organization an hourly rate, also reimbursing any out-of-pocket expenses incurred by the programmers. In time, the programmers stayed at their employer and payment was based on the approved timesheets submitted by the vendor.

Obviously, this put customers at a disadvantage because they couldn’t estimate expenses to control budgets. It is not an exaggeration to say that in those days, budgets were set not to achieve a defined functionality but to cap expenditures. Whatever functionality that could be achieved within the allocated budget was all that the organization could hope for. In time, the customers started wanting vendors to define what functionality would be delivered at a fixed price. But software engineers did not have accurate tools, such as engineering drawings, to define the functionality of the proposed product before building it.

Initially, the number of lines of code (LOC) measured the size of the delivered software. But industry could not come to a common definition for LOC. Some program lines had less than 10 characters, while others had more. Languages like COBOL and FORTRAN allowed up to 80 characters per line, but some later languages allowed up to 255 characters per line. Still, LOC was used effectively in COBOL-based software.

IBM worked on this problem and developed function points (FP) to measure the software size from a user’s perspective. This alleviated the issue of defining the size and functionality of the software to be delivered against the agreed price. Even today, some organizations agree to pay on a “per-FP” basis. They measure the size of the delivered software product and pay the vendor based on the number of FPs of software actually delivered. FP became popular enough to spawn a user group, the International Function Point Users Group (IFPUG). IFPUG conducts a certified function point specialist examination to certify people in measuring FP for software products. But to date, a conversion factor for deriving person-hours of human effort from FP eludes common agreement.

In time, other software size measures were introduced, including object points, use case points, FPA Mark II (function point analysis Mark II), feature points, Internet points, story points and software size units (SSU). Test points and software test units measure the size of software testing projects. Each measure is being used and each one, including FP, has its merits, demerits, adherents and critics.

Problems with existing size measures

First and foremost, all size measures, with the exception of SSU, use complexity to increase software size. Complexity would never increase the size or amount of the work to be completed. Rather, it would reduce the productivity. In other words, complexity means it takes more time to complete a piece of work. For example, walking a mile on a well-paved street takes less time than walking one mile on a mountain trail. Trekking six miles on a mountain trail takes less time than trekking that same six miles climbing Mount Everest. The complexity is reducing the rate of achievement (productivity), not increasing the size (distance). The people who use complexity to increase the quantum of work argue that if 10 function points are multiplied by two hours or five FPs are multiplied by four hours, the result is 20 hours. This shows a higher quantum of work and suits the software development organizations that charge based on size, not productivity.

Second, there is no agreed conversion factor between these size measures. We can convert kilometers into miles or pounds into kilograms and vice versa. No conversion factor converts function points into use case points and vice versa. So if two people measure the size of the same software product using two different size measures (say FP and object points), we do not know if either sized the product accurately.

Third, while proponents claim these size measures are easy to use, it’s not true. The fact that certification is needed indicates their complexity. We do not have certification requirements for cost estimation in other industries. IFPUG runs a discussion board on FP, and the sheer number of queries and diverse opinions indicate the extent of confusion.

Other issues exist, but they are beyond the scope of this article.

Software size and person-hours

Function points came into use in 1978. Still, there is no accepted way to convert an FP into person-hours. The same is true for other software size measures. Seminars were conducted, articles were written and books were penned about software estimation. Some professional associations such as IFPUG and the Software Process Improvement Network tried to benchmark a productivity figure, but couldn’t. One website says producing an FP could vary from two person-hours to 135.

But none, other than this author, suggested that industrial engineering methods be used to set productivity for software development activities. Using work study could set productivity norms for software development activities.

Most authors wanted to derive productivity using an organization’s historical records. One book suggested that productivity baselines need to be adjusted using the data of every completed project on a continuing basis. Obviously, most people in the software development industry have not heard of the concept of “a fair day’s work for a fair day’s pay,” as software developers are paid far more fairly than others.

Now what is the problem with that? They estimate costs, make profits and pay their staff well. Why should we begrudge them? Unfortunately, the cost of software is much higher than it should be. If IEs went to work on software development methods and productivity, it could cut costs in the U.S. alone by 30 percent and reduce outsourcing. 

No credible published data gives productivity figures for software development. The International Software Benchmarking Standards Group sells some data, but they are mostly benchmarks based on completed projects. IEs know that past performance is not a reliable indicator of future performance, especially in changing environments.

Productivity vs. capacity

IEs know what productivity is, and they express it in terms of standard time. Productivity is used in the context of human endeavor and is applied at the smallest possible granularity. They also know what capacity means – that capacity is the throughput of a facility achieved by performing multiple disparate and dissimilar activities at different workstations by differently skilled professionals.

When we say that a car factory produces 1,000 cars a day, we mean the throughput of the factory is 1,000 cars per day. We do not say that the productivity of the factory is 1,000 cars per day. But we specify productivity for a threading operation on a lathe machine. Productivity comes with a set of adjectives such as the type of operation, the type of tools and workstation used, the methods of working and so on. We specify capacity without such adjectives.

A couple of fertilizer plants may possess capacity to produce 100 tons per day, even though each has entirely different technologies, manpower, fuels, machinery and methods. We need not specify other details while expressing capacity. But the productivity of different types of lathe machines in different shops cannot be the same until all factors are identical.

Productivity is specific to an organization, to the equipment used, the methods in practice and the tools used. It is micro in nature. It is for human endeavor. Capacity is macro; it is for a facility. Capacity comprises multiple activities performed by multiple people. Capacity is throughput.

Capacity is designed and productivity is measured. The term “capacity” is used in the context of a facility, and the term “productivity” is used in the context of human endeavor.

Productivity can be improved without adding equipment or manpower. Capacity cannot be increased without adding equipment or manpower.

Right now, the software development industry means capacity when using the term productivity. It states productivity as 100 LOC per day or 0.7 FP per day. This usage is erroneous from, at least, the industrial engineering standpoint.

Software, like any other deliverable, needs multiple disparate activities to develop it. Software development begins with a feasibility study to determine if computers should be used, what it would cost and its probable benefits. The next activity defines the proposed software’s user requirements, collating them from users of the present system and documenting them along with auxiliary requirements for ensuring security, ease of use, protection against malicious (intended or unintended) use, intruder prevention, reliability and safety.

Using the user requirements document, a software requirements specification document is prepared detailing the high level design, software architecture, screens and report design. Next, the software design description is prepared, containing the detailed program design, detailed design of screens and reports, test plans and test cases. Next, the developer produces the source code, executable code, unit testing, integration and integration testing. The last activity before delivery is additional testing such as system testing, acceptance testing, load testing and parallel testing. This software engineering methodology comes from the Institute of Electrical and Electronics Engineers’ software engineering standards.

All these macro activities have their own set of micro activities. Each activity needs a differently skilled person, and no one person can carry out all these activities. Lumping all these activities together and specifying one single figure of productivity is not correct. But that is precisely what happens in the software development industry today.

Empirical methods vs. work measurement

Almost all software estimation theorists advocate using empirical methods to derive productivity for software development. But in software development, it’s normal for every few years to yield a new operating system, new programming language or new version of a programming language. This makes the efficacy of empirical methods not very high.

Take Windows for example. It first was released in 1985 as an add-on to the MS-DOS operating system. Windows 2.0 was released in 1987; Windows 3.0 in 1990 migrating through version 2.1; it was upgraded to Windows 3.1 in 1992. Then came Windows 95, Windows 98, Windows 2000, Windows Millennium, Windows XP, Windows Vista in 2007 and now Windows 7 in 2010.

Environmental stability, a prerequisite to using empirical methods, is lacking in the software development field, except for the much more stable mainframe platforms. But the fresh programming on mainframes is miniscule compared to the software development taking place on personal computers. For the analysis to produce reliable results, empirical methods need a large base of data from which to extract meaningful generalizations. And the generalizations prove to be true only over a large set of usage.

For example, over a large number of coin tosses, the coin has a 50 percent probability for landing heads or tails. For any one toss, this probability is of no use to predict the outcome. In software estimation, we need precise data to predict every outcome, not an average over a large number of outcomes. It is not adequate to say that the average productivity is 10 hours per function point with a range from two to 135. We need to predict for each project what the effort is going to be within an acceptable level of error. Empirical methods do not give that kind of predictability; industrial engineering studies do.

There is no wonder that in the real world of software estimation, the dictum oft heard is “Estimate taking all variable values on the higher side, then multiply the final values by a figure of 2.5 and then pray for it to be met.”

As the determination of productivity figures using empirical methods is from historical data, it does not tell us if it was the “right” performance. The past performance could have been skewed, ranging from super-performance to poor performance.

The past performance may have been influenced by extreme conditions. Consider the following possibilities:

  1. The team may have been composed predominantly of either super-skilled (people with more than five years of experience) or poorly skilled (trainee or fresher) resources.
  2. There was a significant positive (for early completion) or negative (losing job if not completed in time) reward associated with the project.
  3. The client could have caused major delays.
  4. The requirements could have been severely volatile.
  5. The methods of software development could have been different in projects.
  6. The team may have been under stress and worked more hours per day.
  7. The tools used in projects could have been different.
  8. The software development cycle used could have been different in projects.
  9. There are about 35 varieties of possible software tests, but not all projects would use all the tests. So one project could have used 10 varieties of tests while another could have used only four. Others could have used many more tests.

We can list some more factors, but they all affect the productivity derived using empirical methods. If we combine all such projects and empirically derive productivity for use in the organization, how can the results be accurate? When we derive productivity bunching a number of activities together, the impact of the factors enumerated above is severe. If we derive productivity at a more granular level, we will have a more reliable set of figures.

In empirically derived productivity (combining all activities including requirements analysis, software design, software construction and software testing together), the different methods used in performing these activities would not be considered at all. When we derive productivity at activity level, we can consider the different methods of performing the activities and thus have a better set of figures, for example:

  1. Requirements analysis using interviewing methods
  2. Requirements analysis using documents
  3. Requirements analysis using Delphi method
  4. Software design using IEEE methodology
  5. Software design using use case methodology
  6. Software design using OO (object-oriented) methodology
  7. Software construction using IDE (interactive development environment)
  8. Software construction without using an IDE
  9. Software testing using manual methods
  10. Software testing using WinRunner testing tool
  11. Software testing using Rational Robot testing tool

The above list is not exhaustive. Using the synthesis technique, we can estimate with much better accuracy as well as set targets with better confidence while carrying out actual work.

Final words

There is an argument that software development is a creative endeavor. It was so at the beginning, but is not so anymore. Nowadays, the programmers work with the assistance of a design document just as shop floor workers work with the assistance of a drawing. The software development methodologies have matured, and a body of knowledge has been established for developing software.

All in all, the software development industry is on an erroneous track when it come s to setting productivity norms for software development, norms that would be used in software estimation as well as target setting during project execution. Somehow, IEs have not been utilized by this industry for this purpose. Perhaps IEs have not shown any initiative to aid the software development industry in setting productivity norms. The net result is the customers are overpaying, and the cost of goods is on the higher side. Microsoft, according to published sources, sold 1 million copies of its operating system Windows Vista in one year. Now, if a car can sell that many units in one year, the car price surely would be reduced by at least 20 percent. IEs need to get involved in this area.

Murali Chemuturi is a fellow of industrial engineering from the Indian Institution of Industrial Engineering. He has a total of 40 years of work experience, with 15 years in manufacturing and the rest in information technology. He has handled software development, data processing, training and consulting. He wrote Software Estimation, Best Practices, Tools and Techniques and has developed various software estimation tools. He presently leads Chemuturi Consultants.

SIDEBAR: Evolution of analysis

These days, work is more dynamic, and changes in responsibilities have accelerated, argue Edward L. Levine and Juan I. Sanchez in the 2007 volume of Ergometrika: A Journal of Work Measurement Research.

That means work analysis methods must change to maintain validity. The authors report that the increase in self-directed work teams and networks of people and machines organized into work systems mandates the addition of team or system level descriptors in addition to descriptors applicable to individual performers. And disciplines like industrial engineering, ergonomics and industrial and organizational psychology must have an interdisciplinary focus.

Work measurement specialists also should add targets or beneficiaries of work processes, including customers and clients, to their sources of data. The methods of collecting that data have advanced to include electronic performance monitoring and reviews of recorded phone calls to analyze critical incidents. The traditional surveys and interviews can be done online or via telecommunications.

As work and its context increases in complexity, so should the levels and units of work analysis. When reviewing a network, team or system, the measurer must include the activities, tasks and personal attributes of individual team members along with an analysis of team level work flow and work sequencing. Specialists also can use theoretically derived dictionaries of descriptors for various aspects of the analysis to streamline and make the analyzed information more valid.




© 2014 Institute of Industrial Engineers. All rights reserved.