12/24/2011

Final Paper, enjoy!


Anbork.info presents,
Pain-Reduction Techniques
for Software Development

Andrew Bork
12/14/2011
                    


W
ith an hour left before the deadline, tensions run high amongst partners Jim and Matt on Coding Team Alpha. Working separately across campus, they are communicating via gChat regarding a bug that neither of them seems to be able to find. Jim suggests that the FOR loop may be going past the allotted array indices into segfault territory. A quick change of the parameters, compile, run, and … Success! No errors. Now for the test cases. Pass, Pass, Pass … Fail. “What happened?” asks Matt. “The last test case failed” types Jim. “Let me take a look at it, go ahead and commit your code” says Matt. Jim commits the code to their mutual repository and Matt updates his version. While merging the changes, he notices the conflict.  His code detects the “to” keyword, and has the regex backend working properly. The repository does not. He compiles and runs his version. “Mine just passed all the tests” types Matt. “I added the “to” parser while you were working on that loop.” With all test cases passing, the final version is committed and tagged. Both members continue on with their nights with the deadline achieved.
This is just one example of a common problem that programmers working in groups face. But an error with version control software is hopefully the least of your worries. Bad documentation, inconsistent development techniques, and how to avoid that dang Subversion error above are a few of the topics I will discuss in this article.

How to Read This Paper

·         Each major section has a “Reducing the pain” area, underlined and Italicized that contains what exactly you should do to make your life easier. Here is an example

Reducing the pain on the “How to Read This Paper Section”

Ø  Remember to read this so you know how to reduce the pain.
Or in an abbreviated format,

RTP – How to Read This Paper

Ø  Remember to read this so you know how to reduce the pain.

Disclaimer to the Reader

The information given in this article is not guaranteed to actually help you. Because people both work and learn differently you may hate a certain tool or love another one. Either way, the tools I list are my personal preference for minimizing pain in software development.



Table of Contents



Writing and Documentation of a Group Project

“Interpreting bad writing is like interpreting the bible, crazy people will start killing babies” - Oscar Wilde

Comments

Comments have been explained to me in a few different ways over the years. They can have several different functions within your code and all of them are just as valid as the next. The first is commenting with the purpose of reviewing your code. Essentially with this method, the comments are treated as pseudo code, and reading the comments gives you a literal understanding of the program. In this style, planning future code can also take place. Next is commenting to summarize or explain the intent. This style of commenting is generally more preferred than the previous method, as long as you are following good coding standards.  A good explanation for intent commenting comes from McConnell:
“Good comments don't repeat the code or explain it. They clarify its intent. Comments should explain, at a higher level of abstraction than the code, what you're trying to do”. McConnell, Code Complete.
Explain why you did something rather than telling an observer what you did. An example given in our class was that of a financial software program written in an ancient language, COBOL. A past programmer did not comment why vast amounts of money were being sent to certain people, and in actuality, those people should not have received it. A third style of commenting is Algorithmic Description. In this method, one explains technical design choices, for example, why quicksort was chosen instead of merge sort. And finally, one can use comments for debugging purposes. An example would be commenting out debug flags and methods.

Working in Groups

Dividing a project among team members will allow one to finish the project sooner. In my recent assignment, my partner and I never formally discussed this. This led to an unorganized experience and a semi-incomplete project. My mentality was to sit down and work on it as much as I could. And hopefully we would get done early. Work assignments were never distributed, and the workload turned out to be heavier in my favor. This ended with us turning in an incomplete project. Regardless of the negatives, there were benefits of working in groups. In an instance I was out of town, my partner was still able to work on the assignment and vise-versa. Another benefit was the use of version control. Trying to wrangle email threads or flash drive swapping or the, always hilarious, “emailing the code to yourself” are all methods that should not be used. It removed any guesswork from making sure each of us had the most up to date codebase.
Because I will talk about version control methods in extreme depth later, I will explain the ‘need-to know’ to better understand what a commit message is. When you make changes to a server-hosted copy of a project and its code, the server asks you to write what you did in those changes. The commit message is that text.  It tells everyone else what you did and why you did it.
Commit Message
·         1-line summary of Main change
·         Why did you make every other change
When I was first learning how to write commit messages, they were "less than stellar" as my project partner put it. I would record what changes I made, but would fail to remember exactly everything I changed in that revision. I hoped the diffs (see section on Subversion Diff) of the files would explain what I forgot. The problem though, was not my partner understanding the changes themselves, but why I changed what I did. He then gave me a format for commit messages.
The first line is basically a summary of the main changes or a title. The preceding lines are explanations of the why the changes occurred. With this format I eventually got the hang of committing verbosely. The morale of the story is no matter how small the change, someone at some time is going to have to try to understand it. It might even be you coming back to a previous iteration, trying to figure out when you made a certain change.

Reducing The Pain – Writing and Documentation and Groups

Ø  Comment to help others, not yourself.
Ø  Yes, work in groups. Divide the project early
Ø  Commit Messages – Summary, followed by why.

//GNU
static char *
concat (char *s1, char *s2)
{
  while (x == y)
    {
      something ();
      somethingelse ();
    }
  finalthing ();
}
Physical Pain in Software Development

GNU Style

Coding requires time, usually a lot. Coding also takes place in front of a computer. You might be sitting, you might be standing, but you are probably staring at a screen, using a keyboard. How the code actually looks on the screen can not only impact on how fast one understands it, but also how stressed the eye gets over time.
For a recent project, my partner and I decided to switch to a common coding style for all of our code. We chose GNU style for two main reasons, increased readability due to eye ergonomics of block statements and the two space tab width. Like the Allman and Whitesmiths styles, GNU style puts braces on a line by themselves, indented by two spaces, except when opening a function definition, where they are not indented. In either case, the contained code is indented by two spaces from the braces. It is mandated by the GNU Coding Standards and is used by nearly all maintainers of GNU project software.

The Dvorak Keyboard

All that time spent in front of that computer and keyboard can also have an effect on one’s wrists. The Dvorak Keyboard layout was designed to help prevent repetitive strain injury and to reduce instances of carpal tunnel syndrome. The benefits though are not just limited to ergonomics. Because the home row contains the most used letters in the English alphabet, the key stroke distribution is around 70% making you a faster typer. Also, Dvorak uses around 63% of the finger motion of compared to Qwerty which gives it increased ergonomics. It is important to note that the letter distribution is limited to the English language, and international users may not see a benefit.

RTP – Physical Pain

Ø  Use GNU style.
Ø  Use the Dvorak keyboard.

Design Choices

Software Development methodologies (SDLC)

Waterfall SDLC

The traditional waterfall model of software design contains a number of distinct phases.
1.       Requirements Specification – Defining the problem and the expected outcome.
2.       Design - This is where a general plan is created for building the implementation.
3.       Implementation - The coding is done here.
4.       Testing and Debugging - Errors are located and removed.
5.       Delivery to the customer and maintenance of the project.
This SDLC can be compared to an assembly line in which each phase of the project is separated. However one of the main drawbacks of this method is the extreme difficulty to return to a previous stage. For example if a requirement changes in Step 4, one must go back to the design phase and then repeat the implementation phase. If one is exactly sure of every single requirement and every single outcome of a simple project, then this can be an efficient method to use.

Iterative SDLC

A counter process to the waterfall method is an Incremental development scheme. In this system, a series of mini-Waterfalls methods are completed, such that new features are added on during each of the mini-Waterfalls. This method can also do some “Big Design Up Front, and then proceed to truncated mini-waterfalls. The initial software concept, requirements analysis, and design of architecture are defined via Waterfall.
An obvious quality of this method is that one can easily change the project requirements. At the completion of a feature addition, one can go back and reevaluate what is needed. If large changes are needed, time is saved because they were discovered sooner.

Defensive Programming

Defensive Programming is not so much a design choice, but a design necessity. Writing functional code involves writing code that uses valid input (if any) to achieve some function. Unfortunately, input is not always valid. The purpose of writing defensive code is to ensure that the program or method does not do something bad (i.e., crash or, worse, quietly return a bad value). There are many ways to do this. For example, before using an object reference, you should make sure the object is valid (i.e., not null). Before accessing an array, you should validate that the array index is within range (one of the more common causes of memory access errors). When using a case statement, a default block should be used in case the input value doesn't match any of the expected values.
While not difficult, writing defensive code is a tedious task. On a recent project we found that adding defensive coding added 5% to our coding time (i.e., for every 40 hours we spent writing functional code we spent an additional two hours writing defensive code). For some types of projects, such as small projects, temporary applications and projects where you are positive your data is clean (e.g., when you have a small set of long time users who know what values to enter), it might not make sense investing in defensive programming. But for projects where data quality is uncertain or the consequences of failure are high, it makes sense to invest in defensive programming

Understandable Code

When writing code, an overall thought to keep in one’s head is KISS, “Keep It Simple Stupid.” Simplicity and clarity should be paramount because who knows who will look at your code next, let alone need to understand it. You cannot even rely on yourself to understand your own code after a period of time. Again, this ties into being able to write good comments. In a utopian programming world, one would write code that is so understandable, that “you could read it to your grandmother and she would understand.” But that is obviously not the case. Therefore, including necessary and helpful comments is important to make sure your code is understood

RTP – Design Choices

Ø  When faced with a choice between waterfall, and iterative, choose Iterative.
Ø  Defend your code against Evil input!
Ø  Remember your Grandmother.


Software Development tools

Version Control - Apache Subversion

·         Who? – Created by Collabnet
·         What? - Software versioning and a revision control system.
·         Where? – Now an Apache Project
·         When? – October 20, 2000
·         Why? - An effort to write an open-source version-control system which operated much like CVS but which fixed the bugs and supplied some features missing in CVS

We were required on a recent project to use a version control method called Subversion. (SVN) It is mainly a command line interface upon which users update and commit changes to a shared code base. However don’t let the command line steer you away. I did not use it once during the entire duration of our project. An integrated development environment with a subversion plugin was mainly used, such that most of the problems users run into were eliminated. Because of this, I highly recommend use of subversion in your next group project.
·         svn checkout — Check out a working copy from a repository.
This is the first thing one does when working with a repository of code. You check out a copy for making changes. Now how do you put those changes back?
·         svn commit — to send changes from your working copy of your own version of the code to the repository
It is important to note a common problem users face when committing new files to the repository. Using the command line, one must first do an “svn add” of the new file. Then you can actually commit it. In my IDE however, this command was not needed, as it was done automatically. 
SVN's best feature is the ability to work on a shared code base. This code area is called the Trunk. The Trunk is where the full copy of your code lives all time. When one commits code to SVN they are adding their changes to the trunk.
·         svn update — Update your working copy of the code from the repository.
Updating will bring the latest version of the code from the repository to your machine. From that point the developers using the repository can make changes to the updated versions and then commit those changes to create a new, updated version of the code.
These two features are great but lead to one issue. If two team members are working on the same file, make changes to that file, and then try to commit, they will hit a snag because of an out of sync code base. The process can be resolved with what is called a merge.
·         svn merge — apply the differences between two sources to a working copy path.
A Merge is when a developer looks at the changes to his/her local version of a file and the changes made to a Trunk file that hasn't been updated on the local version, and fixes the synchronization issue by amending the code to include both updates and one of the updates. There are 3 possible outcomes of this process.
Ø  Override and Update: In this case, the local version of the file is discarded and the Trunk code overwrites everything. 
Ø  Override and Commit: This is similar to Override and Update, but the Trunk code is overwritten with the local version.
Ø  Merge Changes: In this solution, both changes are merged into the local version and (s)he is able to commit the file and overwrite the Trunk file.  
Now you might ask, "How am I supposed to know what the differences in the files are?" well SVN has a command for that too.
·         svn diff  — Display the differences between two paths (files).
During the process of Committing, Updating, and Merging, the code base can change quite drastically.  If a problem arises, backing up your data is key to continuing a project without major losses.  SVN provides a way to back up your code via Tags.
A Tag is a snapshot of your code at a given time.  It is similar to a Trunk but it is assumed a Tag will not change in the future.  This is useful if a situation arises where it is necessary to revert your code base to a prior state.  In larger environments, Tags are created with each build. Version 1.0, and so on. Luckily, we have never had to revert back to a tag yet in our project.
Just in case you need to offshoot your code for some reason, say to introduce a new developer to your code, and you don't want them working in the Trunk, SVN has something called Branches, They are a separate Trunk-style project used for work on the project.  Branches can be created from existing code bases, including the Trunk and Tags.  Even better, you can merge between a Branch and the Trunk to bring in changes from the temporary project that needs to be implemented into the code base.

Testing frameworks

Unit testing is an important part of designing and developing code. Here are 5 reasons why.
1.       Unit testing allows you to test your code all the time, in an easy way -- Instant gratification 
2.       If one starts unit testing early in the process of writing their code, then it leads to a better design. Your method names and classes will be based on the tests you write and can therefore help you organize your code.
3.       Unit testing allows one to make changes very easily to your code later down the line. Developing a good baseline of tests, lets you refactor you code easily because you know it works already.
4.       Since unit tests actually "test your code," you can "try to break it." This will give you an understanding of your code you would not have if not testing.
5.       And the most important (for your boss) Money. The cost of fixing a bug early is exponentially cheaper than finding it very late in the development process.

CxxTest for C++ is a very nice way to write unit tests and do unit testing in C++. Cpp files are generated with Perl from your regular classes and then you compile those and run them. You therefore do not have to declare your tests in your regular code and can just attach their framework to do your testing. It even has a plugin for Eclipse that lets you easily write tests and use them through a GUI.
Some of the features (as taken from its own description):
·         Doesn't require Runtime Type information.
·         Doesn't require member template functions.
·         Doesn't require exception handling.
·         Doesn't require any external libraries (including memory management, file/console I/O, graphics libraries.)
·         Is distributed entirely as a set of header files.
CxxTest was unruly at times because it does not have the ability to print out error messages if a test fails. The runtime just breaks and you must manually go in and add a “cout << error message;” to the end of your test.
Testing is done by writing test cases mainly with a function called TS_Assert(). It works very similar to the standard Assert() macro. You just put whatever function you want to test inside it say,
     TS_ASSERT( 1 + 1 > 1 );     or     TS_ASSERT_EQUALS( square(2),4 );

Makefiles

Makefiles are a class of Expert Systems, such that an expert system is a computer system that emulates the decision-making ability of a human expert. The general idea is that make supports minimal rebuilds. For example, you tell it what parts of your program depend on what other parts. When you update some part of the program, it only rebuilds the parts that depend on that. While you could do this with a shell script, it would be a lot more work (explicitly checking the last-modified dates on all the files, etc.) The only obvious alternative with a shell script is to rebuild everything every time. For tiny projects this is a perfectly reasonable approach, but for a big project a complete rebuild could easily take an hour or more -- using make, you might easily accomplish the same thing in a minute or two.
In my experience, makefiles are complicated and have a steep learning curve. They also strictly depend on formatting specifics. Certain entries must be an exact amount of spaces in a new line. I used them in a recent project in a different manner though. My IDE (in the next section) automatically generated them, and I just modified the makefiles for my own uses. Even with the negative aspects, makefiles create a repeatable process one can go through so that they do not have to reinvent the wheel every time they make a change to their code. Any time a process can be repeated, saves both time and money.

Eclipse, Debugging, and IDE’s

As I hinted earlier, I have been using Eclipse, ever since my second computer science class. But what are the real benefits of IDE's and Integrated Build tools and Debugging? All the things I have previously listed in this paper show up in this amazing IDE. That alone is the benefit: Singular Integration. Most of the listed concepts below do occur in other IDE’s but these are my reasons for using Eclipse.
·         The concept of a project workspace is introduced in an IDE. A workspace is just what it sounds like, a place to keep all of your relevant documents. The IDE knows they all connect so that similar operations perform on all the files.
·         Eclipse has Automatic formatting and indentation correction, such that I just click a button or press a keyboard shortcut to format all of my code in the project into the GNU style.
·         It allows you to right click on a variable and rename that single variable in multiple locations at one. This saves you vast amounts of time; you do not have to go searching for all instances of your hidden “getter/value” method/variable in separate files and folders. Anything in the project workspace is searched and automatically corrected.
·         Depending on the Language you are working in, Eclipse will generate classes, methods and general code for you, if you supply the interface/requirements definition.
·         Eclipse has Seamless SVN Integration.
·         Eclipse has CXX test integration.
IDE’s in general have autocompleting function capability. Your IDE will know what all the possible choices are for every situation you type, and it will suggest to you which methods to use.                       

Debugging

Debugging is a methodical process of finding and reducing the number of bugs, or defects, in a computer program or a piece of electronic hardware, thus making it behave as expected. Debuggers usually use a concept called breakpoints to stop at certain lines of code so that one can examine them. A typical debugging process begins with reproducing the original problem and trying to simplify it down so that a certain part of the code is deemed a likely spot that caused the problem. Debugging can be performed via a Trace, or live, while accessing program states. Print (or tracing) debugging is the act of watching (live or recorded) trace statements, or print statements, that indicate the flow of execution of a process. This is sometimes called printf debugging, due to the use of the printf statement in C.
In Eclipse, the entire process is contained within a debugging view. You are able to view current variable states, and follow function calls up and down the stack. Every step the program takes can be viewed in order. To set a breakpoint you just double-click on the line of interest. The only problem with debugging in Eclipse is it searches your workspace $PATH for your source files and your system $PATH for standard library source files. If it cannot find them, then it does not display the source code while one tries to step through the program.

Reducing The Pain – Software Development Tools

Ø  Subversion
o   If using the command line and you have a new file, always “Add” before you “Commit”
o   Use an IDE with Subversion integration. Recommendation – Eclipse
Ø  Testing
o   When using Cxx test, remember to make sure your makefile includes the library in the correct top-level directory.
Ø  Makefiles
o   Don’t deal with them. Have your IDE do all your compilation and run-testing.
Ø  Debugging
o   Again, steer away from archaic tools such as GDB and stick with your IDE.

No comments:

Post a Comment