Programming

Human Versus Machine Code Analysis

I see human code reviews as one tool in the quality toolbox. My opinion is that to keep code reviews interesting and engaging, humans should be the last link in the chain and get the most interesting problems. What I mean is that if the code review is burdened with pointing out that an opened resource was not closed or that a specific path through the code will never happen, code reviews become draining and boring. I also believe that code reviews need to scale up to teams that are not co-located. That might mean using an asynchronous process, like a workflow system or using collaboration tools to do the code review through teleconferences and screen sharing. A workflow system can prevent code from promotion into the mainline build until one or more reviewers have accepted it. To keep the code reviews interesting and challenging, I give the grunt work to the machines and use static analysis and profiling tools first. Before you can involve the humans, your code needs to pass the suite of static analysis tests at the prescribed level. This will weed out all the typical mistakes that are larger than what a compiler finds. There are many analysis and profiling tools available in open source and commercially. Most of my development work is in server-side Java, and my analysis tools of choice are FindBugs, PMD and the profiling tool in Rational Software Architect. FindBugs is a byte code analyzer, so it looks at what the Java compiler produces and is less concerned with the form of source code. PMD analyzes source code. Both tools have configurable thresholds for problem severity and they can accept custom problem patterns. PMD has a big library of problem patterns, including things like overly complex or long functions or methods. The RSA profiling tool only tests timing down to the method level of classes. It can quickly help a developer focus on where the sluggish parts of a system are hiding, which is valuable information going into a review. Once the code makes it through this array of automated tests, bring the humans in to look at it and get their input. I have found this approach in our case changes the review from a potentially adversarial situation into one with an educational tone. The review meeting, if it happens synchronously, is not overtaken by the small problems and pointing out basic mistakes. It is concerned with making recommendations at a higher level to improve the larger design. FindBugs, U. of Maryland, http://findbugs.sourceforge.net/ PMD, SourceForge, http://pmd.sourceforge.net/ Rational Software Architect for WebSphere Software, http://www-01.ibm.com/software/awdtools/swarchitect/websphere/ ...

Testing the Proximity Sensor of iPhone 4

The proximity sensor problem with iPhone 4 is a topic of much debate on discussion boards, blogs and news sites. The proximity sensor is used by the phone to determine if the user is holding the phone to her ear during a call. The phone uses input from the proximity sensor to decide whether to activate the screen and allow touch input. Many owners of the phone have reported the screen re-enabling while holding the phone to their ear during a call, while others have reported no problems. I am one of the unfortunate owners of the phone that has inadvertently placed a caller on hold or interrupted other callers with touch tones emanating from my end of the call. As of today I am on my second iPhone 4 and disappointed to report my experience has not improved. There are plenty of emotional calls for Apple to quickly address this problem. I want to take a different approach. In this essay, I will provide a discussion about testing approaches and what that means for complex systems. I use the proximity sensor as a real-world example to demonstrate the problem many have experienced and the difficulty involved in testing for it. Inside iPhone is a complex hardware system arranged in a hierarchy of command and control: a microprocessor, memory, storage, transceivers for wi-fi, cellular, and bluetooth networks. It has touch, light, sound and proximity sensor input. It has external interfaces for the dock, a headset, the SIM card. It has a single display integrated with the touch sensor input. The software distributed through these components is a system of collaborating state machines, each one working continuously to keep the outside world pleased with the experience of interfacing with the phone. It is not just a single human the iPhone must keep satisfied. The cellular networks, wi-fi access points, bluetooth devices, iTunes and other external systems are part of this interactive picture as well. This is oversimplified, but you can begin to appreciate the enormous burden of testing such a small, complex device used by millions of people. How does a team even start to tackle such a problem? Meyer (2008) presents seven principles in the planning, creation, execution, analysis and assessment of a testing regimen. Meyer writes, above and beyond any other reason for the testing process “is to uncover faults by triggering failures.” The more failures are triggered and fixed before delivery of a product to the end user, the less expensive it will be than to fix them later. Humans are a required yet flawed variable in the planning and execution of test suites for complex systems like iPhone. Identifying all possible triggers for failure can be nearly impossible. Savor (2008) argues that, “The number of invariants to consider [in test design] is typically beyond the comprehension of a human for a practical system.” How do we test the multitude of scenarios and their variations in complex systems without fully comprehending usage patterns and subtle timing requirements for failure in advance? Meyer (2008) argues that testing time can be more important a criteria than absolute number of tests. When combining time with random testing, also called test escapes, there is a possibility of uncovering more faults than just using a huge, fixed suite of tests continuously repeated without deviation. Test escapes as defined by Chernak (2001) are defects that the fixed testing suite was not able to find, but instead found later by chance, an unassociated test, or by an end-user after the project was delivered to production (e.g. introduction of randomness). Now that we have some background information and terminology, let’s design a test that could make iPhone’s proximity sensor fail to behave correctly. Consider an obvious test case for the proximity sensor: Initiate or accept a call.Hold the phone against ear. Expect the screen to turn off and disable touch input.Hold the phone away from ear. Expect the screen to turn on and enable touch input.End call. This test case can be verified in a few seconds. Do you see a problem with it? It is a valid test, but not a terribly realistic one. The problem with this test case is that it does not reflect what really happens during a call. We do not sit frozen with all of our joints locked into place, refusing to move until the call has completed. To improve the test case, we add some physical action during the call: Initiate or accept a call.Hold the phone against ear. Expect the screen to turn off and disable touch input.Keep the phone still for 30 seconds.Change rotation, angle and distance of phone to ear while never exceeding 0.25 inches from the side of the caller’s head. Expect the screen to remain off and touch input remain disabled.Return to step 3 if call length is less than ten minutes.Hold the phone away from ear. Expect the screen to turn on and enable touch input.End call. Now the test case is reflecting more reality. There are still some problems with it. When I am on a call, I often transfer the phone between ears. Holding a phone to the same ear for a long time gets uncomfortable. During lulls in the conversation, I pull the phone away from my ear to check the battery and signal levels, and then I bring it back to my ear. These two actions need to be added to the test case. Additionally, all of our timing in the test case is fixed. Because of the complex nature of the phone, small variations in timing anywhere can have an impact in successful completion of our test case. Introducing some variability to the test case may raise the chances of finding a failure. In other words, we will purposely create test escapes through random combinations of action and timing. Initiate or accept a call.Hold the phone against ear. Expect the screen to turn off and disable touch input.Keep the phone still for [A] seconds.Randomly choose step 5, 6 or 7:Change rotation, angle and distance of phone to ear while never exceeding 0.25 inches from the side of the caller’s head. Expect the screen to remain off and touch input remain disabled.Pull phone away from ear for [B] seconds and return phone to ear. Expect the screen to turn on and then off at the conclusion of the action.Move phone to opposite ear. Do no exceed [C] seconds during the transfer. Expect the screen to turn on during the transfer and then off at the conclusion of the transfer.Return to step 3 if call length is less than [D] minutes.Hold the phone away from ear. Expect the screen to turn on and enable touch input.End call. There are four variables to this test case. It is possible that certain combinations of [A], [B], [C] and [D] will cause the screen to re-enable during a call and cause the test case to fail. Have fun with this one. There are in fact combinations that induce proximity failure on iPhone 4 regardless of the version of iOS, including 4.1. Finally, an important part of test design is the inclusion of negative test cases. Chernak (2001) writes, “A test case is negative if it exercises abnormal conditions by using either invalid data input or the wrong user action.” For a device like iPhone, tapping the screen constantly while it is disabled, making a call while holding it upside down, or using a faulty docking cable can all be considered negative test cases. Testing complex systems, regardless of physical size, is an incredibly difficult task. Some of this can be performed by humans and some through automated systems. Finding failures in highly integrated systems requires a combination of fixed test suites, test cases that reflect real usage scenarios, and the introduction of test escapes through creative randomization. References Chernak, Y. (2001). Validating and improving test case effectiveness. IEEE Software, January/February 2001. Meyer, B. (2008). Seven principles of software testing. Computer, August 2008. Savor, T. (2008). Testing feature-rich reactive systems. IEEE Software, July/August 2008. ...

XML's Role in Creating and Solving Information Security Problems

XML provides a means to communicate data across networks and among heterogeneous applications. XML is a common information technology acronym in 2010 and is supported in a large variety of applications and software development tooling. XML’s wide adoption into many technologies means it is likely being used in places not originally imagined by its designers. The resulting potential for misuse, erroneous configuration or lack of awareness of basic security issues is compounded by the speed and ease with which XML can be incorporated into new software systems. This paper presents a survey of the security and privacy issues related to XML technology use and deployment in an information technology system. The XML Working Group was established in 1996 by the W3C. It was originally named the SGML Editorial Review Board. (Eastlake, 2002). Today XML has ten working groups, focused on areas including the core specifications, namespaces, scripting, queries and schema, and service modeling. XML is an ancestor of SGML, and allows the creation of entirely new, domain-specific vocabularies of elements, organized in hierarchical tree structures called documents. XML elements can represent anything related to data or behavior. An XML document can represent a customer’s contact information. It can represent a strategy to format information on a printer or screen. It can represent musical notes in a symphony. XML is being used today for a variety of purposes, including business-to-business and business-to-consumer interactions. It is used for the migration of data from legacy repositories to modern database management systems. XML is used in the syndication of news and literary content, as in the application of ATOM and RSS feeds by web sites. The flexibility and potential of XML use in information technology received increasing attention when web services technology was introduced. Web services communicate using XML. They can be queried by client programs to learn their methods, parameters and return data. They are self-describing, which means the approach of security-through-obscurity cannot apply if a web service is discovered running on a publicly accessible server. An attacker can ask the service for its method signatures and it will respond with specifications of how to invoke it. This does not mean the attacker will have the necessary information, such as an authentication credential or special element of data to gain access to the web service. Treese (2002) summarizes the primary security concerns involved with deploying any communications system that must transmit and receive sensitive data. Confidentiality, to ensure that only the sender and receiver can read the messageAuthentication, to identify the sender and receiver of a messageIntegrity, to ensure that the message has not been tampered withNon-repudiation, to ensure that the sender cannot later deny having sent a messageAuthorization, to ensure that only “the right people” are able to read a messageKey management, to ensure proper creation, storage, use, and destruction of sensitive cryptographic keysWeb services are a recent technology, but fall prey to similar attacks used in past and current Internet technologies. Web services are vulnerable to many of the same attacks as browser-based applications according to Goodin (2006). Parsing and validation of data provided inside a transmitted XML document must be performed regardless of the source of the transmission. DTDs or XML schemas that are not strict enough in their matching constraints can leave an open path for parsing attacks. Goodin (2006) details the primary attacks on web services as: Injection attacks that use XML to hide malicious content, such as using character encoding to hide the content of stringsBuffer overflow vulnerabilities in SOAP and XML parsers running on the system providing the web serviceXML entity attacks, where input references an invalid external file such as a CSS or schema, causing the parser or other part of the application to crash in unexpected waysLawton (2007) details a similar problem with AJAX technology. AJAX stands for Asynchronous JavaScript and XML. It not so much a specific technology, but a technique to reduce the number of whole page loads performed by a browser. An AJAX-enabled application can update portions of a browser page with data from a server. The data transmitted between browser and server in an AJAX communication is formatted in XML. The server-side of an AJAX application can be vulnerable to the same attacks described for web services above - overflows, injection of encoded data, and invalid documents. Goodin (2006) recommends IT staff scan the publicly facing systems of an enterprise periodically for undocumented web services, and scanning known web services and applications with analysis tools such as Rational’s AppScan (2009). Lawton (2007) also recommends the use of vulnerability scanners for source code and deployed systems. A common mistake made even today in the deployment of web services or web applications is a lack of use of HTTPS or TLS protocol in securing the transmission of data between the client and server. All data transmitted across the Internet passes through an unknown number of routers and hosts before arriving at the destination. The format of an XML document makes it easy for eavesdroppers to identify and potentially capture a copy of this data as it passes through networking equipment. The easiest solution to this problem is to host the web service or web application over the HTTPS protocol. HTTPS is HTTP over SSL, which encrypts the data during transmission. HTTPS will not protect data before leaving the source or after arriving at the destination. Long, et al. (2003) discusses some of the challenges of bringing XML-encoded transactions to the financial services industry. Privacy is a primary concern for electronic financial transactions. Long states that simply using SSL to encrypt transmissions from system to system is not enough to satisfy the security needs of the financial sector. There also exists a need to encrypt portions of an XML document differently, so that sensitive content has different visibility depending on the system or person accessing it. The XML Encryption Syntax and Processing standard allows any portion or an entire XML document to be encrypted with a key, and then placed within an XML document for transmission or storage. The encrypted document remains a well-formed XML document. Eastlake (2002) describes the Encryption Syntax Processing and Signature Syntax Processing recommendations for XML. Using the ESP recommendation, portions of the document can be encrypted with different keys, thus allowing different people or applications to read the portions of the document for which they have keys. This approach provides a form of multi-level security within a single XML document. With web services comes the problem of knowing which ones to trust and use. Even more difficult is the problem of giving that determination to a computer. Carminati, Ferrari and Hung (2005) describe a problem of automating the evaluation of privacy policies of web services in today’s world of data storage, cloud, banking and financial institutions and multi-player gaming businesses that exist entirely on the Internet. They reason that systems discovered in web services directories may not operate with compatible privacy policies required by the consumer’s organization or local laws. They propose three solutions for handling this problem. The first is basic access control from a third party that evaluates and quantifies the privacy policy for a service provider. The next is cryptography in the services directory so that the consumer decodes only compatible services. The final solution is a hash solution, which looks for flags supplied by the web services provider describing their support of specific aspects of privacy policy. As with the problem of transmitting sensitive XML data across the Internet unencrypted, there is also a problem of authenticating the source of an XML document. How does a person or system verify the document’s originator? The Signature Syntax Processing recommendation briefly mentioned above provides a method to enclose any number of elements in a digital signature. This method uses public key cryptography to sign a portion of the document’s data. The originator of the document provides a public key to the recipient through a secure channel (on a flash drive) in advance of transmitting the data. The originator uses their secret key to sign the document data, which produces a new smaller block of data called a digital signature. The signature is embedded in XML around the protected elements. The signature and the XML data are used by the recipient to determine if the data was changed in transmission. The signature is also used to verify the identity of the signer. Both authentication steps require the recipient to have the sender’s public key. The problem of securing documents through path-based access control was addressed early in XML’s lifetime. Damiani et al (2001) describe an access control mechanism specifically designed for XML documents. Their Access Control Processor for XML uses XPath to describe the target location within a schema for access along with the rights associated to groups or specific users of the system. Additionally, Böttcher and Hartel (2009) describe the design of an auditing system to determine if confidential information was accessed directly or indirectly. They use a patient records system as an example scenario for their design. Their system is unique in that it can analyze “[…] the problem of whether the seen data is or is not sufficient to derive the disclosed secret information.” The authors do not discuss whether their design is transportable to non-XML data sources, such as relational databases. In 2010, we have technologies to use with XML in several combinations to secure document content during transmission and in long-term storage. The use of SSL, Encryption Syntax Processing and Signature Syntax Processing recommendations provide a rich foundation to create secure XML applications. The maturity of web servers, the availability of code analyzers and the increasing sophistication of IT security tools decreases the risk of infrastructure falling to an XML-centric attack. With the technical problems of securing XML addressed through various W3C recommendations, code libraries and tools, a new problem of education, precedence in use and organizational standards for their application becomes the new security issue in XML-related technologies. This is a recurring problem in many disruptive technologies called awareness. Goodin (2006) says, “[…] the security of web services depends on an increased awareness of the developers who create them, and that will require a major shift in thinking.” XML has introduced and solved many of its own security problems - through application of its technology. It becomes important now for the industry to document and share the experiences and practices of deploying secure XML-based Internet applications using the technologies recommended by the W3C and elsewhere. References Böttcher, S., Hartel, R. (2009). Information disclosure by answers to XPath queries. Journal of Computer Security, 17 (2009), 69-99. Carminati, B., Ferrari, E., Hung, P. C. K. (2005). Exploring Privacy Issues in Web Services Discovery Agencies. IEEE Security and Privacy, 2005, 14-21. Damiani, E., Samarati, P., De Capitani di Vimercati, S., Paraboschi, S. (2001). Controlling access to XML documents. IEEE Internet Computing, November-December 2001, 18-28. Eastlake, D. E. III., Niles, K. (2002). Secure XML: The New Syntax for Signatures and Encryption. Addison-Wesley Professional. July 19, 2002. ISBN-13: 978-0-201-75605-0. Geer, D. (2003). Taking steps to secure web services. Computer, October 2003, 14-16. Goodin, D. (2006). Shielding web services from attack. Infoworld.com, 11.27.06, 27-32. Lawton, G. (2007). Web 2.0 creates security challenges. Computer, October 2007, 13-16. Long, J, Yuan, M. J., Whinston, A. B. (2003). Securing a new era of financial services. IT Pro, July-August 2003, 15-21. 1520-9202/03. Naedele, M. (2003). Standards for XML and web services security. Computer, April 2003, 96-98. Rational AppScan. (2009). IBM Rational Web application security. Retrieved 14 February 2009 from http://www-01.ibm.com/software/rational/offerings/websecurity/webappsecurity.html. Treese, W. (2002). XML, web services, and XML. NW, Putting it together, September 2002, 9-12. ...

Automated Dynamic Testing

In researching some testing solutions for my own work, I found an article in the IEEE library from a group of Microsoft researchers about automating the software testing process. (Godefroid, et al, 2008). They are taking the concepts of static analysis to the next level by researching and prototyping methods of generating harnesses for automated dynamic testing. They discuss four different projects for test automation, but the most interesting one for me in the article was a project called SAGE (scalable, automated, guided execution). The SAGE project is based on white box fuzz testing and is intended to help reduce the number of defects related to security. “Security vulnerabilities (like buffer overflows) are a class of dangerous software defects that can let an attacker cause unintended behavior in a software component by sending it particularly crafted inputs.” The solution is white box because the program under test is running under a debugger-like monitor. The monitor observes and catches runtime exceptions generated by the program as the testing suite is exercising it with a variety of dynamically generated invalid input data. The tester and monitor programs are able to record, pause and replay for engineers the history of events up to the exception causing the program to crash. An early version of SAGE was able to find a defect in a Windows kernel-level library responsible for parsing animated cursor image files. The tool generated over 7,700 test cases based on sample input data from testers and exercised the library for a little more than seven hours before the defect was uncovered. After analysis of the SAGE data, a fix for the defect was released as a out-of-band security patch for Windows. The authors write, “SAGE is currently being used internally at Microsoft and has already found tens of previously unknown security-related bugs in various products." Reference Godefroid, P., de Halleux, P., Levin, M. Y., Nori, A. V., Rajamani, S. K., Schulte, W., Tillmann, N. (2008). Automating Software Testing Using Program Analysis. IEEE Software. 0740-7459/08. ...

Easing into Agile

The article I found this week was written by two individuals working for Nokia Networks. They were involved in training product development staff in agile practices. Vodde and Koskela (2007) discussed Nokia’s environment for the past decades and their experiences in introducing test-driven development into the organization. The implication in the article is that because of the size and amount of retraining necessary to move toward agile development, Nokia is adopting agile practices a piece at a time (small bites) versus dropping the waterfall approach entirely and throwing the development teams into a completely new and unfamiliar situation. Vodde and Koskela also point out the benefit they found in using hands-on instruction for TDD versus lecture-based education. The authors make a few observations during the time they were teaching TDD to experienced software developers. One important observation was, “TDD is a great way to develop software and can change the way you think about software and software development, but the developer’s skill and confidence still play a big role in ensuring the outcome’s quality.” The exercise the authors used in their course was to develop a program to count lines of code in source files and tests to verify the program’s operation. Each session would add a new requirement in the form of a new type of source file. The students were forced into an evolutionary/emergent situation in which the design had to change a little as the current and new problems of each requirement were solved. What the students’ speculated as a design at the beginning and what they actually ended with were different. The authors conclude with some recommendations for successful TDD adoption with other agile practices or as an isolated practice in a legacy environment: Removing external dependencies helps improve testabilityReflective thinking promotes emergent designA well-factored design and good test coverage also help new designs emerge Reference Vodde, B., Koskela, L. (2007). Learning Test-Driven Development by Counting Lines. IEEE Software. 0740-7459/07. ...

Software Engineering

Many of us in the IT industry aspire to create a Software Engineering discipline. We work continually to mature our understanding of what it is and should become, and work to increase the external trust of the profession. Are we there yet in relation to other engineering disciplines? Probably not. Whether or not it is there today does not matter as much to me. What matters to me is that at this time we are trying to take it there. My feeling is that Software Engineering is a pursuit, not an endpoint. I also believe software craftsmanship exists, but there is a place for it. I do not want a craftsman designing my antilock brakes, getting creative with my future (hopefully distant) artificial heart, liver or whatever code, or the algorithm for measuring the carbon monoxide levels in my home. I would like an engineer knowledgeable in precedence and predictability to create these things. Denning and Riehle (2009) point out some interesting areas where Software Engineering is weak compared to other disciplines: Predictable outcomes (principle of least surprise)Design metrics, including design to tolerancesFailure tolerancesSeparation of design from implementationReconciliation of conflicting forces and constraintsAdapting to changing environments I think an additional challenge we deal with in developing a Software Engineering discipline is that software - code - is unlike any material previously available to us. Add to this the various forms and structures the material can take changes every five to ten years - Java, C#, client/server, web services, hosted, distributed, etc. We are trying to build a stable practice around an unstable material. For example, our environment is beginning an architectural shift toward large multi-core processors. (Merritt, 2008). Our tools, thinking and education may require a refresh to adapt our software design approaches to deal with this change. (See http://clojure.org/state). In short, I believe in Software Engineering. It is out there and we are chasing it down. We make some right and wrong turns along the way. Each time we get a little closer to it, our world of technology changes dramatically and it just slips out of our grasp. The longer we hunt for it, the more mature, disciplined and predictable our profession becomes. References Denning, P., & Riehle, R. (2009). The Profession of IT: Is Software Engineering Engineering?. Communications of the ACM, 52(3), 24-26. Merritt, R. (2008). CPU designers debate multi-core future. EE Times. Retrieved 24 October 2009 from http://www.eetimes.com/showArticle.jhtml?articleID=206105179 ...

Use of Cryptography in Securing Database Access and Content

This research paper explores the use of cryptography in database security. It specifically covers applications of encryption in authentication, transmission of data between client and server, and protection of stored content. This paper begins with an overview of encryption techniques, specifically symmetric and asymmetric encryption. It follows with a specific discussion about the use of cryptography in database solutions. The paper concludes with a short summary of commercial solutions intended for increasing the security of database content and client/server transactions. Whitfield Diffie, a cryptographic researcher and Sun Microsystems CSO says, “Cryptography is the most flexible way we know of protecting [data and] communications in channels that we don’t control.” (Carpenter, 2007). Cryptography is “the enciphering [encryption] and deciphering [decryption] of messages in secret code or cipher; the computerized encoding and decoding of information.” (CRYPTO, 2009). There are two primary means of encryption in use today. They are symmetric key encryption and asymmetric key encryption. Symmetric key encryption uses a single key to encrypt and decrypt information. Asymmetric key encryption, also known as public key cryptography uses two keys - one to encrypt information and a second key to decrypt information. In addition to encryption and decryption, public-key cryptography can be used to create and verify digital signatures of blocks of text or binary data without encrypting them. A digital signature is a small block of information cryptographically generated from content, like an email message or an installation program for software. The private key in the asymmetric solution can be used to create a digital signature of data, while the public key verifies the integrity of data and related digital signature that was created using the private key. The main advantage of public key cryptography over the symmetric key system is that the public key can be given away, as the name implies - made public. Anyone with a public key can encrypt a message and only the holder of the matching private key can decrypt that message. In the symmetric system, all parties must hold the same key. Public key cryptography can be used to verify the identity of an individual, application or computer system. As a simple example, let us say I have an asymmetric key pair and provide you with my public key. You can be a human or a software application. As long as I keep my private key protected so that no one else can obtain it, only I can generate a digital signature that you can use with my public key to prove mathematically that the signature only came from me. This approach is much more robust and less susceptible to attack than the traditional username and password approach. Application of cryptography does not come without the overhead of ongoing management of the technology. In a past interview (Carpenter, 2007), Whitfield Diffie, a co-inventor of public key cryptography says the main detractor from widespread adoption of strong encryption within I.T. infrastructures is key management - the small strings of data that keep encrypted data from being deciphered. Proper integration of cryptographic technologies into a database infrastructure can provide protection beyond username and password authentication and authorization. It can absolutely prevent anyone from reading sensitive data during transmission or stored on media. Some U.S. government standards require the use of encryption for stored and transmitted personal information. Grimes (2006) details the recent laws passed in the United States requiring the protection of personal data. These laws include the Gramm-Leach-Bliley Act for protection of consumer financial data, the Health Insurance Portability and Accountability Act for personal health-related data, and the Electronic Communications Privacy Act, which gives broad legal protection to electronically transmitted data. As discussed above, public key cryptography can be used to authenticate a person, application or computer using digital signature technology. A database management system enhanced to use public keys for authentication would store those keys and associate them with specific users. The client would use their private key to sign a small block of data that was randomly chosen by the server. The client would return a digital signature of that data, which the server could verify using the stored public keys of the various users. A verification match would identify the specific user. The second application of encryption technology in database security is used to protect transmission of data between a client and server. The client may be a web-based application running on a separate server and communicating over a local network, or it may be a fat-client located in another department or at some other location on the Internet. A technology called TLS can be used to provide confidentiality of all communications between the client and server, i.e. the database connection. “Transport Layer Security (TLS) and its predecessor, Secure Sockets Layer (SSL), are cryptographic protocols that provide security and data integrity for communications over networks such as the Internet.” (TLS, 2009). Web servers and browsers use the TLS protocol to protect data transmissions such as credit card numbers or other personal information. The technology can be used to protect any data transmission for any type of client-server solution, include database systems. TLS also has authentication capability using public key cryptography. This type of authentication would only allow known public keys to make a connection. This approach is not integrated at a higher level in the solution, such as the application level. Finally, cryptography can be used to protect the entire content of database storage, specific tables or columns of table data. Encrypting stored content can protect sensitive data from access within the database management system, through loss of the storage media, and an external process that reads raw data blocks from the media. The extent to which stored content is encrypted must be weighed against the overhead of encrypting and decrypting data for transaction-intense systems. Britt (2006) stresses the importance of selectively encrypting only those portions of the content that are evaluated to be a security risk if released into the public. He says a “[…] misconception is that adding encryption will put a tremendous strain on database performance during queries and loads.” This type of protection often uses symmetric key encryption because it is much faster than the public key solution. Marwitz (2008) describes several levels of database content encryption available in Microsoft SQL Server 2005 and 2008. SQL Server 2008 provides the ability to use public key authentication directly in the access control subsystem. Additionally, the entire database server storage, individual databases and table columns can be encrypted using public key encryption. (SQLS, 2009). Table columns, such as those used to store social security numbers, credit card number, or any other sensitive personal information are a good choice for performance sensitive systems. Use of this capability means that the only way to obtain access to the unencrypted data within a column of a database table protected in this manner is to use the private key of an individual who has been granted access. The user’s private key is used to authenticate and gain access to information in the database. Extra protection is gained since the private key is never co-located with the encrypted data. IBM’s DB2 product supports a number of different cryptographic capabilities and attempts to leverage as many of those capabilities that are present in the hosting operating system - Intel-based, minicomputer or mainframe. Authentication to the database from a client can be performed over a variety of encrypted connection types or using Kerberos key exchange. DB2 also supports the concept of authentication plug-ins that can be used with encrypted connections. After authentication has succeeded, DB2 can provide client-server data transmission over a TLS connection and optionally validate the connection using public key cryptography. Like Microsoft SQL Server, the most recent releases of DB2 can encrypt the entire storage area, single databases, or specific columns within the database. (DB2, 2009). This paper provided a broad survey of how cryptographic technologies can raise the security posture of database solutions. Cryptography is becoming a common tool to solve many problems of privacy and protection of sensitive information in growing warehouses of online personal information. This paper described the use of cryptography in database client authentication, transmission of transaction data, and protection of stored content. Two commercial products’ cryptographic capabilities were explored in the concluding discussion. There are more commercial, free and open source solutions for protecting database systems not mentioned in this paper. As citizens and government continue to place pressure on institutions to protect private information, expect to see the landscape of cryptographic technologies for database management systems expand. References Britt, P. (2006). The encryption code. Information Today. March 2006, vol. 23, issue 3. Carpenter, J. (2007). The grill: an interview with Whitfield Diffie. Computerworld. August 27, 2007. Page 24. CRYPTO. (2009). Definition of cryptography. Retrieved 18 July 2009 from http://www.merriam-webster.com/dictionary/cryptography. DB2. (2009). DB2 Security Model Overview. Retrieved 18 July 2009 from http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.sec.doc/doc/c0021804.html. Grimes, R. A. (2006). End-to-end encryption strategies. Infoworld. September 4, 2006. Page 31. Marwitz, C. (2008). Database encryption solutions: protect your databases - and your company - from attacks and leaks. SQL Server Magazine. September 2008. SQLS. (2009). Cryptography in SQL Server. Retrieved 18 July 2009 from http://technet.microsoft.com/en-us/library/cc837966.aspx. TLS. (2009). Transport layer security. Retrieved 18 July 2009 from http://en.wikipedia.org/wiki/Transport_Layer_Security. ...

Application of Formal Methods in the Design of Reliable and Secure Software

This research paper explores the use of formal methods in software engineering to design reliable and secure software systems. Formal methods are mathematically focused languages or visual notations to specify behavior, algorithms or other types of program execution while remaining technology independent. This paper provides a brief overview of formal methods and several of the more popular implementations of formal methods in use today for software and systems development. It presents the benefits and drawbacks to formal methods, including reasons why formal methods are not commonplace for all software development. The precision of formal methods provides some opportunity for automation in the software development lifecycle including code generation and automated testing. An exploration of several problem domains where formal methods are often applied is provided. The paper concludes with discussion on the viability of formal methods as a continuing tool of software engineering. Hinchey (2008) defines formal methods as “[…] a specification notation with formal semantics, along with a deductive apparatus for reasoning, is used to specify, design, analyze, and ultimately implement a hardware or software (or hybrid) system.” Formal methods have a relationship to some of the earliest research in algorithms and automated computation. Pure mathematics and symbolic languages were the sole means of algorithmic expression before general-purpose software languages and microprocessors. One such early incarnation of a language for computation was the Turing machine, conceived by Alan Turing in 1936. Turing machines are “[…] simple abstract computational devices intended to help investigate the extent and limitations of what can be computed.” (TM, 2009). Before automated computation was truly possible, many scientific minds were working on ways to direct a computational machine in precise ways. Traditionally, formal methods are used in the specification and development of systems requiring high dependability, such as communication, flight control and life support. Something is dependable if its performance is constant. Reliability is the degree to which something is accurate, stable, and consistent. Security is a guarantee against loss or harm. Hanmer (2007) discusses the relationship between security and dependability, and the common quality attributes of the two when developing a system. He states that something is dependable if it exhibits reliability, maintainability, availability and integrity. Something is secure if it exhibits availability, integrity and confidentiality. The commonality between the two sets is availability and integrity. In the information technology world, the opposite of these two qualities are downtime and inconsistency - something we often see today resulting from informal software specification and lackluster development processes. As mentioned above, formal methods can be applied in the phases of specification, design, implementation or verification of software systems. There is potential use for formal methods throughout the entire development lifecycle. Requirements for software systems typically come from stakeholders in the domain in which the software is used, such as aerospace or finance. Those requirements are provided in human-readable form and need an initial transformation into a more precise language. Software designers can refine the formal specification through a series of iterations and deliver them to developers for implementation. The architecture, functionality and quality attributes of the software can be checked against the formal specifications during peer reviews with other designers and developers. Finally, the teams responsible for testing and verification of the system’s proper operation can use the formal specifications as scripts in developing test suites for automated or manual execution. The specifications from formal methods can be used for more than documentation of a system’s requirements and behavior. The precision in many formal methods allows the utilization of automation to reduce human error and increase consistency in the delivery of the final product. Translation of some or all of formal method languages into general-purpose computer source languages is possible, freeing the developers to concentrate on interesting refinements and optimization of the code, versus laboriously writing every line by hand. Stotts (2002) describes their project in which JUnit test cases were generated from formal method specifications. The automated approach enabled them “[…] to generate more test methods than a programmer would by following the basic JUnit practice, but our preliminary experiments show this extra work produces test suites that are more thorough and more effective at uncovering defects.” The formal methods research team at NASA Langley Research Center has developed a domain-specific formal method language called Abstract Plan Preparation Language. The research team focus and creation of the language is, “[…] to simplify the formal analysis and specification of planning problems that are intended for safety-critical applications such as power management or automated rendezvous in future manned spacecraft.” (Butler, 2006). There are economic disadvantages of applying formal methods in software development projects. Formal methods are typically more mathematically intensive than flowcharts or other modeling notations. They are also more precise and rigorous which result in more time spent expressing the solution using a formal method notation than a visual modeling language. A developer experienced in application-level design and implementation may have less education in computational mathematics required to work with formal method notation. A primary complaint from designers and developers is that the solution must be specified twice: once in the formal method notation and again in the software language. The same argument persists in the visual modeling community, which does embrace the use of model transformation to source code to reduce the duplication of effort. The availability of formal method transformation tools to generate source code helps eliminate this issue as a recurring reason not to use formal methods. Several formal methods are popular today, including Abstract State Machines, B-Method, Petri Nets and Z (zed) notation. Petri nets date back to 1939, Z was introduced in 1977, abstract state machines in the 1980s and B-Method is the most recent from the 1990s. Petri nets are found in the analysis of workflows, concurrency and process control. The Z formal method language is based on notations from axiomatic set theory, lambda calculus and first-order predicate logic. (Z, 2009). It was standardized by ISO in 2002. Abstract state machines resemble pseudo-code and are easy to translate into software languages. Several tools exist to verify and execute abstract state machine code, including CoreASM available on SourceForge.net. Finally, B-Method is a lower-level specification language with a wide range of tool support. It is popular in the European development community and has been used to develop safety systems for the Paris Metro rail line. (BMETH, 2009). The use of formal methods as a way of increasing software dependability and security remains strong in industries where even partial failure can result in unacceptable loss of money, time and most importantly, life. The choice of applying formal methods in a development project is often an economic, risk-based decision. There will continue to be application programs without the budget or convenience of time to add the extra process and labor required to transform requirements into formal method specifications and then into source code. However, the pattern of formal method use remains consistent in safety and security critical systems. The development and refinement of formal methods continues into this decade, most recently with the standardization of the Z method by ISO. The activity surrounding tooling and automation to support formal methods in during the development lifecycle appears to be growing. Perhaps the software industry is closing on a point of balance among formality in specification, time to market and automation in solution development. References ASM. (2009). Abstract State Machines. Retrieved 11 July 2009 from http://en.wikipedia.org/wiki/Abstract_State_Machines. BMETH. (2009). B-Method. Retrieved 11 July 2009 from http://en.wikipedia.org/wiki/B-Method. Butler, R. W. (2006). An Abstract Plan Preparation Language. NASA Langley Research Center, Hampton, Virginia. NASA/TM-2006-214518. Retrieved 11 July 2009 from http://shemesh.larc.nasa.gov/fm/papers/Butler-TM-2006-214518-Abstract-Plan.pdf. Hanmer, R. S., McBride, D. T., Mendiratta, V. B. (2007). Comparing Reliability and Security: Concepts, Requirements, and Techniques. Bell Labs Technical Journal 12(3), 65–78 (2007). Hinchey, M., Jackson, M., Cousot, P., Cook, B., Bowen, J. P., Margaria, T. (2008). Software Engineering and Formal Methods. Communications of the ACM, 51(9). September 2008. TM. (2009). Turing machine. Retrieved 11 July 2009 from http://plato.stanford.edu/entries/turing-machine/. Stotts, D., Lindsey, M., Antley, A. (2002). An Informal Formal Method for Systematic JUnit Test Case Generation. Technical Report TR02-012. Department of Computer Science, Univ. of North Carolina at Chapel Hill. Retrieved 11 July 2009 from http://rockfish.cs.unc.edu/pubs/TR02-012.pdf. Z. (2009). Z notation. Retrieved 11 July 2009 from http://en.wikipedia.org/wiki/Z_notation. ...

Research Project Proposal: Model-Driven Information Repository Transformation and Migration

This project will apply Unified Modeling Language for the visual definition of data transformation rules for directing the execution of data migration from one or more source information repositories to a target information repository and will result in a UML profile optimized for defining data transformation and migration among repositories. I believe that a visual approach to specifying and maintaining the rules of data movement between the source and target repositories will decrease the time required to define these rules, enable less technical individuals to adopt, and provide a motivation to reuse these models to accelerate future migration and consolidation efforts.Problem Statement and BackgroundMy role in this project includes project planning and task management, primary researcher and developer of the deliverables of the project. My technical background includes being a certified OOAD designer in Unified Modeling Language by IBM and a software engineer for nearly two decades. I recently have been involved in the migration of several custom knowledge data repositories to an installation of IBM Rational Asset Manager.This project will use a constructive ontology and epistemology to create a new solution in the problem space of the project. This is the most appropriate research ontology and epistemology because there is little precedence available in the exactly this area of research. Visually modeling program specifications have been studied in other problem domains and continue to be an area of interest. This particular problem space is unique, relatively untouched, and in an area of considerable interest to me. A possible constraint of the project includes shortcomings of the UML metamodel rules to allow the extension and definition of an effective rules-based data transformation and migration language. A second constraint of the project may be identification of one or more source repositories as candidates for moving to a new system. For the second constraint, one or more simulated repositories may need to be created.This study is relevant to software engineering practitioners, information technology professionals, database administrators and enterprise architects who wish to consolidate data repositories to a single instance. Unified Modeling Language (UML) is primarily used today in information technology to visually specify requirements, architectures and designs of systems, to verify and create test scenarios, and to perform code generation. The UML metamodel was designed to make the language extensible, with the ability to support profiles that allow the language to be customized to support specific problem domains. Researchers and practitioners are finding innovative uses for UML as a visual specification language. Zulkernine, Graves, Umair and Khan (2007) recently published their results in using UML to visually specify rules for a network intrusion detection system. Devos and Steegmans (2005) also published their results in using Unified Modeling Language in tandem with Object Constraint Language to specify business process rules with validation and error checking.This project will contribute to at least two fields of information technology: visual modeling languages, and information consolidation and management. This project will make a unique contribution to the subject area of domain-specific visual languages for the definition of rules. Additionally, a successful outcome from this project will contribute to knowledge in the area of lowering complexity of consolidating repositories to save operations costs and increase modernization of data access systems. An opposing approach to this project would be a federated solution to data consolidation. A federated solution would continue to maintain multiple data repositories and connect their operations via programming interfaces so that clients could access them and combine their data to create the appearance of a unified source.The project area of focus was motivated by my desire to create a visual system for complete migration of a source repository of technical data, such as a technical support knowledge base, to a new product called Rational Asset Manager. My overall goal was to drive the entire migration visually using a single model specification. This specification would visually specify the rules in migrating and transforming data from one system to another as well as visually select the technical mechanisms used to communicate with each information repository, such as SQL databases, web services, XML translation, etc. In addition, I wanted to generate some executable code from the models that would carry out some or all of the movement of data between repositories. In scaling this broad problem area down, I decided to focus on using the model as a specification that would be read by an existing program to carry out the instructions in the model. This program already exists, but does not yet know how to read models. Finally, in focusing on a specific part of the visual specification, I decided to focus on an aspect of the model that locates data from one system, potentially re-maps it or transforms it, and places it into the target system. The final initial research focus would take the form of a UML profile that could be used to specify this aspect of the solution and extend the existing migration program to use the model to perform its work.Project Approach and MethodologyThis project will use a design science methodology to iteratively create, test, and refine the deliverables of the project’s outcome. The design science methodology defines five process steps in achieving the outcome of a research project: awareness of problem, suggestion, development, evaluation, and conclusion. This project is currently at the awareness of the problem phase. The inputs to this phase have been my experiences in working within the problem space for the last several years and the secondary research into the problem area performed thus far. I have encountered shortcomings in automation to help accelerate solutions in this problem space. At the same time, I have observed closely related problems overcome using visual and declarative technologies. Additional secondary research is being conducted to understand the body of knowledge associated with this area of visual modeling. The output at this phase is this proposal for a project to develop a visual language to help accelerate solutions in this problem space. Significant elements of the proposal include the overall vision of the project, the risks of the project, tools and resources required to carry out the project, and the initial schedule to complete the project. Following an accepted proposal, the next phase of this methodology is the suggestion phase, which involves a detailed analysis and design of the proposed solution. During the suggestion phase, several project artifacts will be created and updated with new information. Updated artifacts include the project risks and a refined schedule for completion of the project. New artifacts produced at this phase include early UML and migration tool prototypes to explore various technical alternatives, detailed test and validation plans, and most importantly the design plans for the following phase of the project. A significant activity performed at this phase is the acquisition and readiness of the project resources, such as physical labs, input test data from candidate repositories, access to networked systems to acquire the test data, and installation of hardware and software tools.The development phase of the project uses the design plans established in the suggestion phase to focus on construction of the first iteration of the solution. Experiences during this phase also drive refinements to the project schedule, detailed test and validation plans, risks, and the design plan of the solution. The deliverable of this phase is the first generation of the UML profile and extensions to the existing migration tool to support parsing and using models created with the profile. The test specification models are used to move a larger portion of the candidate source repositories to the target repository. After conclusion of this phase, the project may return to an earlier phase to refine plans or project scope based on what is learned during the development of the solution. If acceptable progress is demonstrated at the conclusion of this phase, the project will continue to the evaluation phase.The evaluation phase focuses most of its effort on formal testing and validation of the solution produced in the development phase. The evaluation of the work against the thesis includes working with specific individuals to determine if this is indeed an approach that will save time and simplify the specification of data migration and transformation rules. Documentation of the testing outcome and comparison to the anticipated outcome may cause the project to return to an earlier phase to adjust scope or expectations. If it is decided the project has met its goals, or the goals are not achievable by the project’s approach, the effort will conclude.The conclusion of this project will involve final documentation of the outcome and packaging of all the project’s artifacts for future research studies. The project’s artifact package will be placed in the public location for others to review and use.As mentioned above, this project will require several physical resources and cooperation from technical experts. The study will require access to two or more legacy data repositories as sources for information. The source repositories should ideally utilize different underlying database technologies and implement different information schemas to test variations of the proposed modeling language as it is developed and tested. Access to the technical administrators of the source repositories will be necessary to understand the repositories’ schema and obtain read-only access or a copy of their information. It would be preferred that the repositories be accessed read-only and utilized via a network, or the information is relocated to a computing system directly available to the research project. The study will require at least one server system running IBM’s Rational Asset Manager. This system will act as the target data repository. Data transformed from the source repositories will migrate into Rational Asset Manager, driven by a migration application that uses the visual specifications as direction. The study will also require a single workstation with IBM Rational Software Architect for development of the visual modeling language and extension of the existing migration programs to read the visual models and perform the migration work from the source to target repositories.A requirement of the project’s determination of success is the need to measure the savings in the time to build a migration solution with and without visual specifications. The migration problems need to be varied as well, from simple one-to-one mappings from a single source repository to a single target repository, to more exotic migration scenarios, such as consolidating multiple source repositories to a single target repository and re-mapping values from the source to the target. Additionally, the reusability of previous solutions will be measured as well. This aspect of the project’s outcome will quantify how easily a specification model can be reused from a previous solution.Definition of the End Product of ProjectThis project will produce several artifacts during the project’s life and at conclusion. Most importantly, a UML profile will be developed that can be imported into Rational Software Architect or Rational Software Modeler. The profile will include usage documentation and example models that demonstrate various types of rules that may be specified in a visual model and how that model is read and executed by the migration program. The migration program will be a reference-implementation of an existing tool program that can read the model configured with the UML profile and generates events for extension points on which to act.In addition to technical deliverables, all project planning and process artifacts, such as the project plan, design plan, risks and mitigation notes, test criteria and test result data will be made available. The project will conclude with the development of at least one article or paper for submission to a research journal to document this project’s challenges and achievements, and an annotated bibliography of secondary research related to the project will be provided.If successful, this project will contribute to simplifying part of the process of developing a migration solution without having to recreate the existing tool used today. The project will add a new component to the migration tool and consumers of the tool can choose to use this new component. An assumption made in this research project is that the UML profile developed as a deliverable will be an approachable alternative for less experienced IT professionals and software engineers. This will be a challenge for the project’s results.ReferencesDevos, F., Steegmans, E. (2005). Specifying business rules in object-oriented analysis. Softw Syst Model (2005) 4: 297–309 / Digital Object Identifier (DOI) 10.1007/s10270-004-0064-z.Zulkernine, M., Graves, M., Umair, M., Khan, A. (2007). Integrating software specifications into intrusion detection. Int. J. Inf. Secur. (2007) 6:345–357. DOI 10.1007/s10207-007-0023-0. ...

Tightening things up with DSHIELD

I was first introduced to DSHIELD last month. Particularly, my interest was in the textual feeds of recommended hosts to block at the firewall. The lists come in the form of a text file formatted with individual hosts and entire networks. The feeds are refreshed on a regular basis from community input. I wrote a small shell script to pull these recommended lists and create an iptables chain that is called from my existing server firewalling rules. The input, output and forwarding chains all call the DSHIELD chain. After about a month of use it seems to have paid off, because the DSHIELD chain in my firewall rules blocks many packets from these blacklisted hosts - and so far no one has complained. This script is run nightly to refresh the DSHIELD chain. If for any reason it cannot contact the DSHIELD site, it will keep the existing rules in place. Here is the BASH shell script I use on Fedora and CentOS servers. ...