AWS announced a solution that combines Data Version Control, Amazon SageMaker AI, and SageMaker AI MLflow Apps, which provides full traceability of ML models' provenance by linking them to specific data versions and experiments.
AWS Machine Learning developed a method for building full lineage for machine learning models, which combines Data Version Control (DVC), scalable computing in Amazon SageMaker AI, and experiment management using SageMaker AI MLflow Apps. This allows precise tracking of data versions and metrics used in model training.
The solution implements two lineage patterns: at the dataset level and at the individual record level. Each trained model artifact is linked to a precise version of data stored in Amazon S3 and committed to Git via DVC.