Source_lines_of

'''Source lines of code''' ('''SLOC''') is a Software_metric used to measure the amount of code in a software program. SLOC is typically used to estimate the amount of effort that will be required to develop a program, as well as to estimate productivity or effort once the software is produced. ==Measuring SLOC== There are two major types of SLOC measures: physical SLOC and logical SLOC. Specific definitions of these two measures vary, but the most common definition of physical SLOC is a count of "non-blank, non-comment lines" in the text of the program's Source_code. Logical SLOC measures attempt to measure the number of "statements", but their specific definitions are tied to specific computer languages (one simple logical SLOC measure for C-like languages is the number of statement-terminating semicolons). It is much easier to create tools that measure physical SLOC, and physical SLOC definitions are easier to explain. However, physical SLOC measures are sensitive to logically irrelevant formatting and style conventions, while logical SLOC is less sensitive to formatting and style conventions. Unfortunately, SLOC measures are often stated without giving their definition, and logical SLOC can often be significantly different from physical SLOC. Consider this snippet of C code as an example of the ambiguity encountered when determining SLOC: for (i=0; i<100; ++i) printf("hello"); /* How many lines of code is this? */ Depending on the programmer and/or coding standards, the above "line of code" could be, and usually is, written on many separate lines: for (i=0; i<100; ++i) { printf("hello"); } /* Now how many lines of code is this? */ Even the "logical" and "physical" SLOC values can have a large number of varying definitions. Robert E. Park (while at the Software Engineering Institute) et al. developed a framework for defining SLOC values, to enable people to carefully explain and define the SLOC measure used in a project. For example, most software systems reuse code, and determining which (if any) reused code to include is important when reporting a measure. ==Origins of SLOC== At the time that people began using SLOC as a metric, the most commonly used languages, such as FORTRAN and assembler, were line-oriented languages. These languages were developed at the time when Punch_cards were the main form of data entry for programming. One punch card usually represented one line of code. It was one discrete object that was easily counted. It was the visible output of the programmer so it made sense to managers to count lines of code as a measurement of a programmer's productivity. Today, the most commonly used computer languages allow a lot more leeway for formatting. One line of text no longer necessarily corresponds to one line of code. ==Usage of SLOC measures== SLOC measures are somewhat controversial, particularly in the way that they are sometimes misused. Experiments have repeatedly confirmed that effort is highly correlated with SLOC, that is, programs with larger SLOC values take more time to develop. Thus, SLOC can be very effective in estimating effort. However, functionality is less well correlated with SLOC: skilled developers may be able to develop the same functionality with far less code, so one program with less SLOC may exhibit more functionality than another similar program. In particular, SLOC is a poor productivity measure of individuals, since a developer can develop only a few lines and yet be far more productive in terms of functionality than a developer who ends up creating more lines (and generally spending more effort). Good developers may merge multiple code modules into a single module, improving the system yet appearing to have negative productivity because they remove code. Also, especially skilled developers tend to be assigned the most difficult tasks, and thus may sometimes appear less "productive" than other developers on a task by this measure. SLOC is particularly ineffective at comparing programs written in different languages unless adjustment factors are applied to normalize languages. Various Computer_languages balance brevity and clarity in different ways; as an extreme example, most Assembly_languages would require hundreds of lines of code to perform the same task as a few characters in APL. Another increasingly common problem in comparing SLOC metrics is the difference between auto-generated and hand-written code. Modern software tools often have the capability to auto-generate enormous amounts of code with a few clicks of a mouse. For instance, GUI_builders automatically generate all the source code for a GUI_object simply by dragging an icon onto a workspace. The work involved in creating this code cannot reasonably be compared to the work necessary to write a device driver, for instance. There are several cost, schedule, and effort estimation models which use SLOC as an input parameter, including the widely-used COnstructive COst MOdel (COCOMO) series of models by Barry_Boehm et al and Galorath's SEER-SEM. While these models have shown good predictive power, they are only as good as the estimates (particularly the SLOC estimates) fed to them. Many have advocated the use of function points instead of SLOC as a measure of functionality, but since function points are highly correlated to SLOC (and cannot be automatically measured) this is not a universally held view. According to Gary McGraw, the SLOC values for various versions of Microsoft's Windows operating system are as follows (estimating from his graph; he does not specify if these are physical or logical measures or a mixture): {| cellpadding="3" cellspacing="0" border="1" style="border-collapse: collapse" summary="Microsoft Windows SLOC Sizes" |- bgcolor="#cccccc" ! Year || Operating System || SLOC (Million) |- | 1990 || Windows 3.1 || 3 |- | 1995 || Windows NT || 4 |- | 1997 || Windows 95 || 15 |- | 1998 || Windows NT 4.0 || 16 |- | 1999 || Windows 98 || 18 |- | 2000 || Windows NT 5.0 || 20 |- | 2001 || Windows 2000 || 35 |- | 2002 || Windows XP || 40 |- | 2006 || Windows Vista || 50 |} David A. Wheeler studied the Red_Hat distribution of the GNU/Linux operating system, and reported that Red Hat Linux version 7.1 (released April 2001) contained over 30 million physical SLOC. He also determined that, had it been developed by conventional proprietary means, it would have required about 8,000 person-years of development effort and would have cost over $1 billion dollars (in year 2000 U.S. dollars). A similar study was later made of Debian GNU/Linux version 2.2 (also known as "Potato"); this version of GNU/Linux was originally released in August 2000. This study found that Debian GNU/Linux 2.2 included over 55,000,000 SLOC, and if developed in a conventional proprietary way would have required 14,005 person-years and cost $1.9 billion USD to develop. Later runs of the tools used report that the following release of Debian had 104 million SLOC, and as of year 2005, the newest release is going to include over 213 million SLOC. {| cellpadding="3" cellspacing="0" border="1" style="border-collapse: collapse" summary="Microsoft Windows SLOC Sizes" |- bgcolor="#cccccc" ! Operating System || SLOC (Million) |- | Red Hat Linux 6.2 || 17 |- | Red Hat Linux 7.1 || 30 |- | Debian 2.2 || 56 |- | Debian 3.0 || 104 |- | Debian 3.1 || 213 |- | Sun Solaris || 7.5 |- | Linux_kernel 2.6.0 || 6.0 |} ==SLOC and relation to security faults== :"The central enemy of reliability is complexity" Geer et al. :"Measuring programs by counting the lines of code is like measuring aircraft quality by weight." A number of experts have claimed a relationship between the number of lines of code in a program and the number of bugs that it contains. This relationship is not simple, since the number of errors per line of code varies greatly according to the language used, the type of quality assurance processes, and level of testing, but it does appear to exist. More importantly, the number of bugs in a program has been directly related to the number of security faults that are likely to be found in the program. This has had a number of important implications for system security and these can be seen reflected in operating system design. Firstly, more complex systems are likely to be more insecure simply due to the greater number of lines of code needed to develop them. For this reason, security focused systems such as OpenBSD grow much more slowly than other systems such as Windows and Linux. A second idea, taken up in both OpenBSD and many Linux variants, is that separating code into different sections which run with different security environments (with or without special privileges, for example) ensures that the most security critical segments are small and carefully audited. == Advantages == (a) Scope for Automation of Counting: Since Line of Code is a physical entity; manual counting effort can be easily eliminated by automating the counting process. Small utilities may be developed for counting the LOC in a program. However, a code counting utility developed for a specific language cannot be used for other languages due to the syntactical and structural differences among languages. (b) An Intuitive Metric: Line of Code serves as an intuitive metric for measuring the size of software due to the fact that it can be seen and the effect of it can be visualized. Function Point is more of an objective metric which cannot be imagined as being a physical entity, it exists only in the logical space. This way, LOC comes in handy to express the size of software among programmers with low levels of experience. == Disadvantages == (a) Lack of Accountability: Lines of code measure suffers from some fundamental problems. First and fore most, It is completely inaccurate and unfortunate to have to measure the productivity of a development project with the outcome of one of the phases (coding phase) which usually accounts for only 30% to 35% of the overall effort. (b) Lack of Cohesion with Functionality: Though experiments have repeatedly confirmed that effort is highly correlated with LOC, functionality is less well correlated with LOC. That is, skilled developers may be able to develop the same functionality with far less code, so one program with less LOC may exhibit more functionality than another similar program. In particular, LOC is a poor productivity measure of individuals, since a developer can develop only a few lines and still be more productive than a developer creating more lines of code. (c) Adverse Impact on Estimation: As a consequence of the fact presented under point (a), estimates done based on lines of code can adversely go wrong, in all possibility. (d) Developer’s Experience: Implementation of a specific logic differs based on the level of experience of the developer. Hence, number of lines of code differs from person to person. An experienced developer may implement certain functionality in fewer lines of code than another developer of relatively less experience does, though they use the same language. (e) Difference in Languages: Consider two applications that provide the same functionality (screens, reports, databases). One of the applications is written in C++ and the other application written a language like COBOL. The number of function points would be exactly the same, but aspects of the application would be different. The lines of code needed to develop the application would certainly be not the same. As a consequence, the amount of effort required to develop the application would be different (hours per function point). Unlike Lines of Code, the number of Function Points will remain constant. (f) Advent of GUI Tools: With the advent of GUI-based languages/tools such as Visual Basic, much of development work is done by drag-and-drops and a few mouse clicks, where the programmer virtually writes no piece of code, most of the time. It is not possible to account for the code that is automatically generated in this case. This difference invites huge variations in productivity and other metrics with respect to different languages, making the Lines of Code more and more irrelevant in the context of GUI-based languages/tools, which are prominent in the present software development arena. (g) Far from OO Development: Line of Code makes no meaning in the case of Object-Oriented development where everything is treated in terms of Objects and classes. Since object is a true representation of data and functionality and so is a Function Point, FPA remains more relevant for Object-Oriented software development. (h) Problems with Multiple Languages: In today’s software scenario, never a single language is used for development. Very often, number of languages are employed depending upon the complexity and requirements. Tracking and reporting of productivity and defect rates poses a serious problem in this case since defects cannot be attributed to a particular language subsequent to integration of the system. Function Point stands out to be the best measure of size in this case. (i) Lack of Counting Standards: There is no standard definition of what a line of code is. Do comments count? Are data declarations included? What happens if a statement extends over several lines? – These are the questions that often arise. Though organizations like SEI and IEEE have published some guidelines in an attempt to standardize counting, it is difficult to put these into practice especially in the face of newer and newer languages being introduced every year. == Related terms == KLOC: 1000 lines of code KDLOC: 1000 delivered lines of code KSLOC: 1000 software lines of code == References == * {{cite web | author = González-Barahona, Jesús M., Miguel A. Ortuño Pérez, Pedro de las Heras Quirós, José Centeno González, and Vicente Matellán Olivera | title = Counting potatoes: the size of Debian 2.2 | url = http://people.debian.org/~jgb/debian-counting/counting-potatoes/ | work = debian.org | accessdate = 2003-08-12 }} * {{cite journal | author = McGraw, Gary | title = From the Ground Up: The DIMACS Software Security Workshop | journal = IEEE Security & Privacy | date = March/April 2003 | volume = 1 | issue = 2 | pages = pp. 59-66 }} * {{cite journal | author = Park, Robert E., ''et. al.'' | title = Software Size Measurement: A Framework for Counting Source Statements | journal = Technical Report CMU/SEI-92-TR-20 | url = http://www.sei.cmu.edu/publications/documents/92.reports/92.tr.020.html }} * {{cite web | author = Wheeler, David A. | title = SLOCCount | url = http://www.dwheeler.com/sloccount | accessdate = 2003-08-12 }} * {{cite web | author = Wheeler, David A. | title = More than a Gigabuck: Estimating GNU/Linux's Size | year = June 2001 | url = http://www.dwheeler.com/sloc | accessdate = 2003-08-12 }} * {{cite web | url = http://www.ccianet.org/papers/cyberinsecurity.pdf | title = Cyber''In''security: The Cost of Monopoly | author = Geer, Daniel ''et.al.'' | format = PDF | year = 2003-09-24 | accessdate = 2004-09-12 }} Page 14 discusses complexity and computing security. * a discussion of complexity and security on the Web Application Security list Category:Software_metrics * current results of SLOCCount on Debian De:Lines_of_Code Es:Línea_de_código