We want to thank many of you out there for making 2014 a record year for Infometrix. Our consulting services and product sales contributed to record growth and allowed us to hire a number of new staff members. During the year we added more expertise in spectroscopy and database programming to a staff that already included more than 100 work-years of experience in chemometrics, chromatography and spectroscopy. We are charged up and ready to break new records in 2015.
Over the next year or so, we will cover several topics in a series entitled Spectroscopy - Best Practices. These will include sample selection, algorithm choice, database management, and some enhancements to customary practices. To initiate this series, we outline below a cautionary note from FTNIR calibrations we have seen in the recent past. In these cases, the assessments are related to octane rating (RON and MON), but similar situations arise in all industries.
Spectroscopy Best Practices - Part 1
There is much chemometrics literature that gives detail on algorithmic approaches for processing optical spectroscopy data and inferring properties of interest. The descriptions are mostly learned and prescriptive, but usually only are tied to a single focused application and do not completely catalog the limits of the approach. Also, for those not well-versed in the science, it can be a little overwhelming.
One of the most common problems lies in the selection of samples to use in a calibration. This will be discussed in more detail in the future, but a hint of things to come (and consider) are represented in the two graphics. The first plot shows a PCA scores plot, so each blue point represents a single spectrum and points that are close to one another are chemically similar. In reality, you need to consider more than just the two dimensions.
The PCA scores plot clearly shows a non-uniform distribution.
In this case, it is a mistake to lump all of the samples into a single PLS model. You will find that additional factors may be needed in order to span this discontiguous space, which can destabilize the model. If the problem is just that the gaps shown simply represent sample chemistries that were not analyzed, it would be useful to test this by hand-blending samples to see the effect of providing more completeness to the calibration set.
It is also critical to avoid using univariate descriptors to judge the effectiveness of a multivariate model. Commonly, practitioners quote the r-squared value of the regression line as in the plot below. Here the red points mark the comparison of the FTNIR prediction of motor octane and, although the r-squared value is reasonably good, the prediction is off for the majority of samples.
Comparison of FTNIR MON versus the reference values, or Y-fit.
Something about the chemistry of the blend has not been taken into account and will result in less accurate assessments and possibly lower stability over time. Using a graphics-based chemometrics program like Pirouette increases your ability to see potential problems in the modeling process. A useful Tech Tip in the column to the right will help by further explaining the nuances of visualizing your data to determine the robustness of your spectroscopy data in the modeling exercise.
After Further Review...Consider the Source!
I attended the ISA-Analyzer Division meeting and found two different Raman companies that were making statements such as:
- "Sure, we see models all the time that are constructed with 15, even 20 factors. That is OK because there is more information content in Raman when compared to NIR." Did you miss the lecture on overfitting a multivariate model?
- "We have found that we can quantitate C9 isomers in a complex hydrocarbon matrix." Not likely, in that optical spectroscopy is ultimately a functional group counter.
- "We are able to accomplish these calibrations because we hire smart people." I really should have thought about that when I assembled my team!
This reminds me of the excitement that overtook the industry when NIR first came on the scene. It has taken nearly 20 years for that technology to settle into its sweet spot. Understanding the strengths and limitations of an instrument is critical in choosing the technique to deploy in a monitoring or control setting. The instrument companies have an agenda and a mass spec vendor will try to interpret the world through the inlet system of his box; an optical spectroscopy vendor will always try to shed some light to solve the problem. Choosing appropriate technology falls to the end user.
Don’t get me wrong, I am a huge fan of optical spectroscopy as a tool that is great for myriad applications; it generates a mountain of data for a very small runtime cost. What is needed is to apply the right tool for the job.
Brian G. Rohrback