Prediction of absorption wavelengths using a combination of PM3 and QSPR modeling approaches

This case study is extracted from the following paper:

Chaudret R., Kiss Cs.F., Subramanian L., J. Photochem. Photobiology. A Chem. doi: 10.1016/j.jphotochem.2014.11.020.

For details not described in this case study, please refer to the paper.
 
Introduction

Light emitting or absorbing systems (molecules, polymers, materials) are widely used in the industry: from chemical sensors (in analytical chemistry or biochemistry) to Organic Light Emitting Diode (OLED) devices used in television or mobile phones. Optical absorption or emission spectrum is one of the properties that play a key role in the usage and performance of such materials. Since the development cost and time of a new molecule can often be important, molecular modeling approaches is a good tool to screen the number of potential candidates relatively rapidly. Such molecular modeling approaches have been shown to predict the absorption or emission wavelengths of many molecular or crystalline systems. However, for such approach to be efficient, it is important to fulfill three major requirement:

  1. The approach should be quick for each molecule
  2. The approach should be automated in order to test many different molecules
  3. The approach should be reliable and the accuracy should be known even if it is not very high in the first round of screening, to provide direction towards the design of the best potential molecules

Though Ab-initio methods clearly have the highest accuracy (3) in prediction, they are highly time consuming and, therefore, really not suitable for screening numerous candidates rapidly.  On the other hand, a pure statistical modeling approach is quick but can predict well only for chemically similar candidates.
 
This case study reveals a “smart screening” methodology which combines accuracy and speed by leveraging both a QM method and a statistical approach.  This case study demonstrates the multi-scale modeling capabilities of MAPS software platform to couple semi-empirical ab-initio calculations and quantitative structure-property relationship (QSPR) analysis to model the first absorption wavelength of a large set of fluorophore and OLED molecules.
 
Method

All work reported here were performed using the modules called MNDO and QSAR within MAPS 3.4 platform,. 40 organic molecules taken from the fluorophore and Jinno laboratory databases were sketched within MAPS. All the molecules selected have a strong aromatic delocalization and their absorption wavelengths range between 247 nm to 699 nm and are the band absorption maxima. Three different approaches were tested and compared here: In the first method, the first absorption wavelength was estimated using PM3 semi-empirical approach. In the second method pure QSPR model was built using only Dragon6 descriptors.  In the third approach, PM3/QSPR models were derived (see Figure1) using a combination of PM3 absorption wavelengths and Dragon6 descriptors. The accuracy and predictivity of the different methods were finally compared. More details on the semi-empirical and QSPR procedures can be found in the related paper from Chaudret et al.
 

Figure 1: View of MAPS QSPR interface
Figure 1: View of MAPS QSPR interface

Figure 2: Representation of PM3 (blue), QSPR (green) and PM3/QSPR (orange) calculated wavelength as a function of the experimental ones. For comparison, y=x reference is plotted.
Figure 2: Representation of PM3 (blue), QSPR (green) and PM3/QSPR (orange) calculated wavelength as a function of the experimental ones. For comparison, y=x reference is plotted.

We compared results from pure PM3, pure QSPR and PM3/QSPR combination methods. The absolute and relative RMSE (for the different sets and the global one) are given in Table 1. Figure 2 represents a plot of the calculated wavelength as a function of the experimental values.

Of the three approaches, the QSPR model (approach 2) predicted the experimental property with the lowest accuracy since, for the full set of molecules, its RMSE is the highest and the R2 of the plot of λPM3 = f(λExp) is the lowest (R2 = 0.75). PM3 semi empirical calculations (approach 1) get a better accuracy (see Table 1 and R2 = 0.87). However, a very important drawback of both approaches is that their error in prediction remains above 10% for the validation set which estimates the predictivity of the model. Such error can be considered too important for many industrial applications and need therefore to be improved.
 

Table 1: Absolute and relative RMSE values for the different models (PM3, QSPR and PM3/QSPR) and the different sets (Validation, Test, Training and Total sets).
Table 1: Absolute and relative RMSE values for the different models (PM3, QSPR and PM3/QSPR) and the different sets (Validation, Test, Training and Total sets).

The PM3/QSPR model appears to have greatly improved (and for a very low additional computational cost) both accuracy (R2 = 0.91) and predictivity (less than 6% error on the validation set). It uses a total of 3 additional Dragon descriptors in addition to the computed PM3 wavelengths.
 
The influence of the different descriptors was computed for both pure QSPR and PM3/QSPR models. For the pure QSPR model, no descriptor appears to be significantly more important than the other. On the contrary, for the PM3/QSPR model, the PM3 absorption wavelength descriptor appears to play a key role in the predictivity of the model while the other descriptors are used as correction to the calculated wavelength.
 
Conclusion:

The great versatility and multi-scale capability of MAPS Platform allows coupling very different approaches to find the most accurate and fastest procedure to treat a problem. In this study MAPS Platform was used to set up a new approach for predicting absorption or emission wavelengths of organic molecule both rapidly and accurately. The approach, which couple semi-empirical calculations (PM3) and QSPR models was shown to have a better accuracy and predictivity than pure semi-empirical (PM3) or pure QSPR approaches. In addition, the use of MAPS data management module (MAPS Database) and MAPS Python scripting capabilities allowes to use this approach to create an automatic, rapid, and accurate method to evaluate and predict the first absorption wavelength of organic molecules. Such an approach is fairly general and can be applied for screening molecules based on any property that can be computed using the rich molecular modeling approaches within MAPS.
 
References:

  1. Scienomics; MAPS platform, version 3.4, 2014, France
  2. http://scienomics.com/
  3. http://www.fluorophores.tugraz.at/substance/
  4. Jinno Laboratory, School of Materials Science, Toyohashi University of Technology, Toyohashi 441-8580, Japan: http://chrom.tutms.tut.ac.jp/JINNO/DATABASE/00database.html
  5. Stewart, J. J. P. J. Comput. Chem. 1989, 10, 209–220.
  6. Todeschini R., Consonni V., Molecular Descriptors Chemoinformatics, Wiley-VCH.
  7. Chaudret R., Kiss Cs.F., Subramanian L., J. Photochem. Photobiology. A Chem. 2015, 299, 183–188. doi: 10.1016/j.jphotochem.2014.11.020.