February 2016
No. 9

Recent Happenings at InfometrixInfometrix began it's 39th year on January 1st 2016 and we are celebrating by honoring our co-founder Bruce Kowalski. Brian Rohrback, our president and CEO for the past 32 years, has helped to establish The Kowalski Fund (contact: Katherine Day Hase,, 206-616-4929) at the University of Washington designed to support graduate students with grant money and tuition waivers and to fund an endowed chair as the focus for this multidisciplinary initiative. The rich heritage of chemometrics at the University through Bruce Kowalski will be revived and enhanced with funds from this donor program. We invite individuals and corporate partners to join us in supporting our future leaders and honor the memory of Bruce Kowalski and the immeasurable impact he has had on introducing the use of chemometrics across myriad industries.

This is the 9th publication of our quarterly newsletter. Please look for helpful tips in chemometric analysis and for Pirouette software. We also continue to provide information about real world applications of the chemometric method and important trends in the fields of automation, quality issues and systems optimization. Please read on.........  
Bio-Rad’s KnowItAll Environment As a software development company, we often get asked to expand our product line. We definitely appreciate all the feedback and we keep a database of all the suggestions our customers provide to us. Thank you for that, by the way. In some instances, the suggestions are for particular chemometric enhancements and, as that is our business, we enter into the process of prioritizing and proving value (is it better than the existing techniques, can we document the analysis for which it will be preferred, can we fit the approach into an existing product or does it warrant a new design, is there enough of a market to justify the development expense????). In some instances, it seems to serve our customer base better if we form an alliance with a company that complements our own products. Over the years, we have not enjoyed a working relationship more productive or more friendly than the one forged with Bio-Rad’s Informatics Division a few years ago.

Although Pirouette functions as a database in that it is a convenient place to store data, results of chemometric analysis and multivariate models, all in a single file, the product lacks most of the common tools available in the modern database environment. In years gone by, we have integrated with database systems from Lotus (yes, Virginia, we are that old) and Microsoft, but these environments as powerful and ubiquitous as they are or were, were not very satisfactory handlers of analytical instrument data – our bread and butter.

CDC Mycobacteria database as a 3D scores plot inside KnowItAll MVP
The CDC Mycobacteria (tuberculosis) database as a 3D scores plot inside KnowItAll MVP

Bio-Rad, purveyors of the Sadtler spectral library (dating to the late 1940s) converted this paper resource into silicon approximately a decade ago and spawned the KnowItAll software to give convenient access to this and many other public and private databases. Given the tools providing both thick-client (the techy sounding term for stand-alone software) and thin-client (think web-based) solutions, Bio-Rad added KnowItAll MVP to cover multivariate processing. Their system can access the core algorithms of Pirouette (Pirouette does not need to be independently-available on the system) to do PCA analysis and process any chemometric model in the Infometrix format. This product also lets you move data to and from Pirouette from within the interface.

So, if you use a lot of published data or if you want to handle your own data in a fast and secure environment, KnowItAll is worth a look. Let me know if you have any questions (
After Further Review...As I have been at the chemometrics game for more than three decades, I have circulated through many industrial settings and found myself being asked about approaches to solving specific problems. Because chemometrics often (but not always) plays a role, there is no surprise in having these conversations. The curious thing is that often when a problem is raised and a solution is suggested, no action is taken. After all, a solution will certainly require an investment in time and often in money. This hesitance to work on the problem is true even when the financial benefit is substantial. Then the even more curious: year after year, the same group will contact me and raise precisely the same problem. When a generation has passed and a new crew is probing, I am lenient; if the query comes from the same person, I find myself prone to lecture. I have a simple outlook: if you are unwilling to take the steps necessary to solve the problem, you do not have a problem; spend your time doing something else in your world.

This is particularly evident in the oil industry, which pains me because that is my origin. In the 1970s and 1980s, the oil companies operated differently and decisions were made based on a simple profitability equation: if the cost of a project is x, and the net present value of the savings is y, the project was done. Now, it appears that the only consideration is one of budget; the industry only looks at the cost of saying YES, never the cost of saying NO. With accountants in charge of a refinery or a chemical plant, what you should expect is a well-documented decline.

Well, maybe the next generation will figure this out.

Brian G. Rohrback

Tech Tip: Validating Chemometric Models
When performing predictions in Pirouette, there are a set of computed objects generated for each sample; the objects vary with algorithm. Examples include the Outlier Diagnostics and X Residuals. In addition, there are other objects that are created for the prediction data set as a whole, such as an Error Analysis in PLS or a Misclassification Matrix in SIMCA. However, these objects require that information about the prediction data be present before the calculations can be accomplished. Pirouette looks for this information before performing the calculations.

Specifically, if the data set you intend to use in prediction is for validating a model, then, by definition, it is expected that values for the properties of interest have already been defined for these data: either Y values for regression algorithms or category values for classification algorithms.

What does Pirouette actually look for then when asked to do a prediction? In a regression algorithm, Pirouette first looks to see if a Y variable exists that has a name that matches any of the Y variables in the model. Note that the name is case sensitive. Next, Pirouette verifies if a matching Y variable is also included. Finally, Pirouette checks to see if any values for the Y variable are missing. If none of these conditions are true, then Pirouette will be able to compare predicted values to known values, allowing computation of the Error Analysis items as well as make a plot of Measured vs Predicted for the Y Fit object. If any of these conditions are true, then the underlying comparison of known to predicted cannot be performed and neither the Error Analysis nor Y Fit can be computed. Note that the Y Fit object will still be created because this object also computes prediction limits around the predicted values which are not affected by these conditions.

In classification algorithms, the checks are a little more lenient. The primary condition is that a category variable of the same name must be present. Again, this name is case sensitive. If this condition is met, the Misclassification Matrix will be produced because it will be possible to compare the predicted category with a known value. Note that a category variable with matching name can be excluded; the Misclassification Matrix will still be generated. And, the category may have missing values yet the classification will still proceed--the only difference is that there will be fewer values in the Misclassification Matrix. Of course, if the Category column is present but all values are missing, then no Misclassification Matrix can be formed.

Validation is an important part of any modeling workflow and Pirouette has been designed to anticipate this objective by computing specific objects that will help model evaluation. It is up to the analyst to ensure that the appropriate validation data—in the form

