Infometrix
The Premier Chemometrics Company



Aug 2015
No. 7


Looking to Fall 2015.
Hopefully all of you are enjoying your summer wherever you may be.  With just a few weeks of summer left, now is the time to plan for fall. Remember that Infometrix' October training course is not far away. Register by completing an application form and sending it to info@infometrix.com to save a spot.  Also visit our website as new information gets added from time to time.  A page with videos on related topics has been added recently as well as Infometrix local area information page, giving travel related guides, information and links.

In this edition, look for a Tech Tip on Outlier Diagnostics, Topological Quantitation for the feature article and the schedule for upcoming events.
Recent Happenings at InfometrixWe apply the principles embodied in standard operating procedures for much of what we do in industry. One exception: chemometrics modeling. Every chemometrics model has a bit of art project to it. The question is: can we succeed at doing multivariate calibrations in an optimized way and yet uniformly across practitioners and locations? A consistent approach to addressing the topology of the data set is a good place to start.

From the newsroom: Infometrix has always tried to be more generous, than say Microsoft, with our support. As Windows 10 gains its footing, we find it more difficult to support Windows XP as new products roll out. Our current line of products support XP, but conflicts have arisen. If we drop support for anything pre-Vista in the next version, will that be a cause for concern? If so, let us know!
Topological Quantitation
Topology is a branch of mathematics that deals with continuity and connectivity in a data set and has its roots back to Leonhard Euler in 1736.  Several texts are devoted to this topic. [1, 2]

In cases where optical spectroscopy is used to characterize substances with variability in source such as crude oil characterization or the monitoring of refinery fractions, a single multivariate model is often not adequate to handle the complexity and the non-linearity of the sample/instrument combination.  As a result, the process should be monitored in a way that allows a more localized approach. [3, 4] This is often referred to as topological quantitation or topological mapping, as it is designed to follow the variations inherent in the data.

There are three mechanisms for handling this problem.
  • One is to employ quantitation algorithms that are built for non-linear applications.  There are a number of examples such as Gauss-Newton or Gradient Descent.  The advantage is that we can deploy a single algorithm to cover a variety of cases.  The disadvantage is that the use of the non-linear approach can lead to overfitting the data, which leads to good models but can underperform in routine use. [5]
  • The second mechanism is to use a succession of linear regression models in a hierarchy.  This concept was first commercially introduced in the early 1990s to improve the performance of PLS predictions for the octane rating of gasoline.  Here all the models are fixed and one regression is used to find the best subsequent models for a “fine-tuning” of the assessment. [6]
  • The third approach is to have the situation dictate which samples will be used for a more-localized assessment.  Literature references are listed under the term Locally-Weighted Regression (LWR).  Where the hierarchical approach transitions from one fixed model to the next, the LWR technique uses the current spectrum as the center point and chooses spectra from a database (model) that are similar, building a localized model on demand and immediately using this model for prediction.  This approach has the advantage of not having to prepare models ahead of time, but still has the attribute of following the topology of the multivariate space. [3, 4]
The latter two approaches are covered in the Infometrix software suite.  The hierarchical approach is embodied in the product InStep.  A LWR system is handled with the IPAK dll. Prediction using LWR is available in the algorithm DLL and used by Pirouette, InStep and 3rd party applications.

References:
  1. Bourbaki; Elements of Mathematics: General Topology, Addison–Wesley (1966).
  2. Rysxard Engelking, General Topology, Heldermann Verlag, Sigma Series in Pure Mathematics, December 1989.
  3. Naes, T.; Isaksson, T.; Kowalski, B., Locally weighted regression and scatter correction for near-infrared reflectance data. Anal. Chem., 1990.
  4. Bouveresse, E.; Massart, D.L.; Dardenne, P., Modified algorithm for standardization of near-infrared spectrometric instruments. Anal. Chem., 1995.
  5. G.A.F Seber and C.J. Wild, Nonlinear Regression.  John Wiley and Sons, 1989.
  6. InStep Manual. Infometrix, Inc., first edition 1993.

After Further Review...The past couple of years have allowed Infometrix to participate in a very diverse set of projects. We will write more about these activities in future newsletters.
  • We refined the optimization process for on-line spectrometers eliminating roughly 50% of the prediction errors in the case of our study across a dozen separate processing plants, several analyzer technologies, and multiple parameters.
  • We had the opportunity over the last five years to prepare a series of chemometric models that are used in tandem to assess the quality in a batch manufacturing process for the pharmaceutical industry.
  • We participated in the development of a new gas chromatograph for which we have integrated chemometrics (both alignment and pattern recognition technologies) directly into the control system. This allows anyone to convert the instrument into an application-specific appliance quickly and inexpensively.
  • We constructed an on-line multi-terabyte, centralized database that pulls in analytical data from diverse locations, automates the multivariate quality assessment, distills the critical information content, and delivers the results in real-time.
The best part about these projects is that they have led us to a better understanding of how and where to do the chemometric processing for maximum impact. Chemometrics is too often constrained to the activities of R&D and, even if they get deployed in a process setting,  implementations are not easily maintained.

Change is in the air.

Brian G. Rohrback
President

To leave comments or questions on any of the topics presented in this newsletter, please visit the Discuss page on our website and look under the title of this newsletter.

Thank you for your support and continued readings.

In This Newsletter
-Introduction
-Topological Quantitation
-Upcoming Events
-Tech Tip: Model Diagnostics - In vs Out
-After Further Review...

Upcoming EventsFACSS - SciX 2015
September 27-October 2, 2015


Chemometrics Training Course
October 14-16, 2015

Gulf Coast Conference 2015
October 20-21, 2015

54th EAS
November 16-18, 2015

Peftec 2015
November 18-19, 2015

IFPAC 2016
January 24-27, 2016

Pittcon 2016
March 6-10, 2016

Tech Tip: Model Diagnostics - In vs OutThere is an outlier diagnostics object computed for the factor based algorithms in Pirouette. You should look at these whenever making models and when doing predictions. Although there are several diagnostics computed (you can see them all in the Table view), this note will focus on just the defaults for PCA: Sample Residual versus Mahalanobis Distance.

When you look at a PCA Scores plot (viewed as a 2D scatter plot), you may observe some samples outside the (95%) confidence ellipse. These samples will exhibit a larger Mahalanobis Distance. Conversely, if you select samples in the Outlier Diagnostics plot with higher Sample Residual (but not high Mahalanobis Distance), you will see that in the Score plot, they fall well within the ellipse.

Remember that these two diagnostics tell different stories about a sample's relationship to the model. The in-model diagnostic (in this case, the Mahalanobis Distance) describes a sample's relationship to the other samples in the model, in the factor space. However, the out-of-model diagnostic, the residual part, cannot be seen quite so easily in the context of the scores. Instead, if you look at the X Residuals plot, you should be able to see where in its profile a high Sample Residual sample differs from the other samples in the model.

These computed objects are also available when you do a PCA prediction. Note that if you are trying to evaluate a sample that is not in the training set, it could be useful to combine it with the training set, then do the prediction and look at all of the X residuals (model samples + prediction sample) to see how their behaviors differ.

For more tips, visit the FAQs and Tips or User Questions page on our website.

The power to define the situation is the ultimate power.

-Jerry Rubin, activist and author (1938-1994)

The Infometrix mission is to provide high quality, easy-to-use software for the handling of multivariate data.
Publications - Application of chemometrics to problems in a variety of research areas


 

FACSS SciX 2015 -  Infometrix will be presenting a talk on enhanced practices for doing calibrations of FTNIR and Raman instruments




PEFTEC 2015 - Infometrix will present a talk at this  conference and exhibition specializing in monitoring and analytical technologies for the petroleum, refining and environmental industries.




Pittcon 2016 - world’s largest annual premier conference and exposition on laboratory science
 



Copyright © 2015 Infometrix, Inc., All rights reserved.