The Premier Chemometrics Company

February 2015
No. 5

Happy 2015!
We want to thank many of you out there for making 2014 a record year for Infometrix. Our consulting services and product sales contributed to record growth and allowed us to hire a number of new staff members. During the year we added more expertise in spectroscopy and database programming to a staff that already included more than 100 work-years of experience in chemometrics, chromatography and spectroscopy. We are charged up and ready to break new records in 2015.

Over the next year or so, we will cover several topics in a series entitled Spectroscopy - Best Practices.  These will include sample selection, algorithm choice, database management, and some enhancements to customary practices.  To initiate this series, we outline below a cautionary note from FTNIR calibrations we have seen in the recent past.  In these cases, the assessments are related to octane rating (RON and MON), but similar situations arise in all industries.

Spectroscopy Best Practices - Part 1
There is much chemometrics literature that gives detail on algorithmic approaches for processing optical spectroscopy data and inferring properties of interest.  The descriptions are mostly learned and prescriptive, but usually only are tied to a single focused application and do not completely catalog the limits of the approach.  Also, for those not well-versed in the science, it can be a little overwhelming.

One of the most common problems lies in the selection of samples to use in a calibration.  This will be discussed in more detail in the future, but a hint of things to come (and consider) are represented in the two graphics.  The first plot shows a PCA scores plot, so each blue point represents a single spectrum and points that are close to one another are chemically similar.  In reality, you need to consider more than just the two dimensions.

 The PCA scores plot clearly shows a non-uniform distribution.

In this case, it is a mistake to lump all of the samples into a single PLS model.  You will find that additional factors may be needed in order to span this discontiguous space, which can destabilize the model.  If the problem is just that the gaps shown simply represent sample chemistries that were not analyzed, it would be useful to test this by hand-blending samples to see the effect of providing more completeness to the calibration set.

It is also critical to avoid using univariate descriptors to judge the effectiveness of a multivariate model.  Commonly, practitioners quote the r-squared value of the regression line as in the plot below.  Here the red points mark the comparison of the FTNIR prediction of motor octane and, although the r-squared value is reasonably good, the prediction is off for the majority of samples.

Comparison of FTNIR MON versus the reference values, or Y-fit.

Something about the chemistry of the blend has not been taken into account and will result in less accurate assessments and possibly lower stability over time. Using a graphics-based chemometrics program like Pirouette increases your ability to see potential problems in the modeling process. A useful Tech Tip in the column to the right will help by further explaining the nuances of visualizing your data to determine the robustness of your spectroscopy data in the modeling exercise.
After Further Review...Consider the Source!   I attended the ISA-Analyzer Division meeting and found two different Raman companies that were making statements such as:
  • "Sure, we see models all the time that are constructed with 15, even 20 factors.  That is OK because there is more information content in Raman when compared to NIR."  Did you miss the lecture on overfitting a multivariate model?
  • "We have found that we can quantitate C9 isomers in a complex hydrocarbon matrix."  Not likely, in that optical spectroscopy is ultimately a functional group counter.
  • "We are able to accomplish these calibrations because we hire smart people."  I really should have thought about that when I assembled my team!
This reminds me of the excitement that overtook the industry when NIR first came on the scene.  It has taken nearly 20 years for that technology to settle into its sweet spot.  Understanding the strengths and limitations of an instrument is critical in choosing the technique to deploy in a monitoring or control setting.  The instrument companies have an agenda and a mass spec vendor will try to interpret the world through the inlet system of his box; an optical spectroscopy vendor will always try to shed some light to solve the problem.  Choosing appropriate technology falls to the end user.

Don’t get me wrong, I am a huge fan of optical spectroscopy as a tool that is great for myriad applications; it generates a mountain of data for a very small runtime cost. What is needed is to apply the right tool for the job.

Brian G. Rohrback

In This Newsletter
-Spectroscopy Best Practices
  Part 1
-Upcoming Events
-Tech Tip: PC1 vs PC2
-After Further Review...

Upcoming Events Pittcon 2015
March 8-12, 2015

ISA-Analysis Division Symposium
April 26-30, 2015

CPAC Meeting
May 11-12, 2015

AAPG Annual Convention and Exhibition 2015
May 31-June 3, 2015

XV Chemometrics in Analytical Chemistry
June 22-26, 2015

Chemometrics Training Course
October 7-9, 2015

Tech Tip: PC1 vs PC2A user asked whether he should consider separation among samples evident in the second PCA score direction as real when the variance explained in the first factor was considerably larger. Keep in mind that variance may or may not be related to differences among the samples. It may be that, although PC1 describes the majority of the variation, PC1 has nothing to do with category separation, rather it may be explaining baseline offset, concentration differences, etc. It may be that information in PC2 or PC3 really is the key to category separation. Looking at this from another angle, if there is separation and it is clear that PC2 is responsible for this separation, do we care that PC1 has explained more variance? Remember that we are really doing two things at once. First, PCA helps isolate information in the data set in as few factors as possible, with most information in PC1, second most in PC2, etc. Second, by isolating information in a small number of factors, we will be more likely to observe trends, clusters, etc., that are naturally in the data than if we were to look at the original data, one variable at a time.

If factor 2 is responsible for separating samples into groups, it is worthwhile looking at the second loading vector as well. Variables that are large (negative or positive) correspond to features in the raw data that are responsible for revealing sample differences. Use the scores and loadings together to enhance your understanding of the data set, both what and why.

For more tips, visit the FAQs and Tips or User Questions page on our website.

"Be who you are and say what you feel, because those who mind don't matter,
and those who matter don't mind."

-Bernard M. Baruch (1870-1965)

The Infometrix mission is to provide high quality, easy-to-use software for the handling of multivariate data.
Pittcon 2015
Infometrix will be attending Pittcon 2015 - Schedule ahead to meet with us to discuss needs and solutions for your industry

LineUp - Reliable, user-friendly and widely applicable
Automatically corrects retention time shift
Archived Newsletters
View archived newsletters


Copyright © 2015 Infometrix, Inc., All rights reserved.