The data are examined on various aspects, logical consistency and criteria to rule out in advance that bad data is part of the NIR calibration. Are there duplicate copies of spectra in the data set? Do the duplicates contain different reference values? Do all the spectra cover the same wave range? Data cleaning will prevent that bad or incorrect data is fitted with inappropriate treatments and unwanted over-fitting to seemingly good results. If outliers are removed on the basis of such a NIR model, which are often not the bad data itself, because the model has learned the bad data.