Follow This Data Validation Process to Improve Your Data Science Accuracy

This article is intended for data scientists who are either beginning or want to improve their current data validation process, serving as a general outline with some examples. First, I want to define data validation here as it can have different meanings for other, similar job roles. For the purpose of this article, we will say that data validation is the process of ensuring the training data used for your model matches or is in line with inference data. For some companies and some use cases, you will not need to worry about this issue if the data is coming from the same source. Therefore, this process must occur and is only useful when data is coming from different sources. Some of the reasons why data wouldn’t be coming from the same source is if your training data is historical and custom-made (ex: features derived from existing data), and/or your inference data is coming from live tables where the training is snapshot data. All that to say, there are plenty of reasons for this mismatch to be present and it will be incredibly beneficial to come up with a process at scale to ensure the data you are feeding your model at inference is what you — aka the trained model data expects. <a href="https://towardsdatascience.com/follow-this-data-validation-process-to-improve-your-data-science-accuracy-99422dfbee72">Website</a>