As data scientists we want to perfect our models, yet, bigger (data) is not always better. Distinguishing strong signals in data and replicating insights gains priority over maintaining large datasets. Less is more.
In this session we will dive into:
• How having too much data can negatively affect the performance of ML models and when less is better
• Can synthetic data be the glue of your ML pipeline? If yes, when and why
• The ramifications of fairness and bias in data and how they can be mitigated
• How we can mitigate privacy risks in data whilst preserving insights
• Optimising the performance of your data for multiple scenarios without creating noise