TrainingData.io brings Active Learning Radiology Data Pipeline with NVIDIA Clara

TrainingData.io brings Active Learning Radiology Data Pipeline with NVIDIA Clara

Machine learning model development can consume a lot of resources. Model development involves continuous back-and-forth between the labeling team and the model developement team. The labeling team and the model development team work on an active learning data pipeline.

Active Learning Data Pipeline

Active Learning Data Pipeline (Source: State of the art in AI)

In one labeling cycle the labeling team performs following tasks:

  • ingests new training data in labeling tools,
  • distributes new labeling task to labellers,
  • does quality analysis (QA) on new labels coming from labellers,
  • prepares labels to be ready for the model development team

In one development cycle the model development team performs following tasks:

  • injests the labels generated by the labeling team,
  • it converts the labels to the format accepted by training infrastructure,
  • retrains the network,
  • feed new datasets to the network,
  • discover edge cases that are failing,
  • prepare dataset with edge cases to be labeled in next labeling cycle.

Critical Step: Discover Edge Cases

When a machine-learning-model-under-development is used to generate predictions for new datasets, the humans-in-the-loop (radiologists) have a crucial role to play. Based on their expertise humans-in-the-loop choose the edge-cases where machine-learning-model-under-development performed poorly. They correct the predictions made by the machine learning model. These edge cases need to be augmented and then used in next training cycle.

TrainingData.io has Automated Active-Learning-Data-Pipeline for Radiology using NVIDIA Clara

NVIDIA Clara allows machine learning engineers to bring their own models (BYOM) and import them in Clara. Clara also makes the inferencing possible through a web-interface.

Privacy Preserving On-Premises Training & Inferencing

The complete solution has the following parts:

  • Project management, dataset management, ML model management in the cloud.
  • Docker container with NVIDIA Clara train-sdk for training & inferencing.
  • Docker container with annotation tool where edge case is identified and corrected by the radiologist.