Seldonian ML
GitHub
Seldonian Machine Learning Toolkit
Safe and fair machine learning for researchers, data scientists, and beginners.
Seldonian ML algorithms...
-
... are machine learning algorithms, including supervised learning (regression and classification) and reinforcement learning algorithms. Regression example: Given data describing students' university applications and their subsequent grade point averages (GPAs) at the university (data link), regression ML algorithms can be used to predict future student GPAs from application materials. Classification example: Given data describing people convicted of crimes and information about whether they later commit a violent crime (data link), classification ML algorithms can be used to predict whether a new person convicted of a crime will commit a violent crime in the future. Reinforcement learning (RL) example: RL algorithms can be used to automatically optimize the amount of bolus insulin injected by an insulin pump (study link), or they might be used to optimize sepsis treatment (study link).
-
... make it easier for data scientists to enforce safety and fairness constraints. In many cases, safety and fairness constraints are necessary for the responsible use of machine learning. This is because machine learning algorithms, even when used by the best experts in the field, can often misbehave. Examples include IBM Watson recommending unsafe cancer treatments and a machine learning system exhibiting racial bias when predicting whether a person will commit a violent crime (the predictions of this system were used by judges during criminal sentencing in 11 states).
-
... make it easier for data scientists to select the appropriate definition of safety or fairness. Machine learning has a wide variety of possible applications including predicting who will commit a violent crime, predicting whether someone would repay a loan, deciding which résumés should be looked at by a human, predicting how far a landslide would travel, optimizing bolus insulin dosing for type 1 diabetes treatment, and optimizing sepsis treatment. The appropriate definition of safety or fairness can differ for each application. Seldonian algorithms provide an interface that allows data scientists to easily select the appropriate definition of safety or fairness for the application at hand. The interface also makes it easy for data scientists to enforce constraints using new definitions of safety and fairness when needed.
-
... were introduced in a 2019 report published in Science, and are an active research topic at top machine learning conferences like NeurIPS, ICML, and ICLR. Seldonian algorithms were introduced in the report titled Preventing undesirable behavior of intelligent machines published in Science (Vol 366, Issue 6468, pages 999–1004, link, open access link). Since then, papers at NeurIPS, ICML, and ICLR have shown how Seldonian algorithms can be used to enforce fairness constraints in the contextual bandit and reinforcement learning regimes (link), how Seldonian reinforcement learning algorithms can account for the constantly changing nature of the real world modeled using nonstationary Markov decision processes (link), how Seldonian classification algorithms can use data from one city to train models that are fair when deployed in a different city with different demographics (link), how Seldonian reinforcement learning algorithms can provide various types of generalization guarantees for distributions over tasks (link), how Seldonian reinforcement learning algorithms can use a model-based approach rather than high-variance Monte Carlo approaches based on importance sampling (link), and how Seldonian reinforcement learning algorithms can be secured against (adversarially) corrupted training data (link). They have also driven research into reducing data requirements for safety and fairness guarantees related to statistical risk measures (link), overcoming technical challenges that limited the possible definitions of safety and fairness (link), and formal verification (machine-checked proofs) of the safety and fairness guarantees of Seldonian algorithms (link).
This Seldonian ML Toolkit...
-
... is a tool for researchers. It includes code to both use Seldonian algorithms (the Engine repository) and evaluate their performance for scientific studies and papers (the Experiments repository). It was deliberately designed modularly to make it easy for other researchers to extend and improve the existing algorithms and to scientifically evaluate novel methods.
-
... is a tool for data scientists. The Python 3 API for this toolkit makes it easy for data scientists to apply Seldonian machine learning. If you have a data set and want to run a Seldonian algorithm once, you can use the Engine repository to be up and running in minutes. The Experiments repository wraps the engine, making it easy for you to experiment with Seldonian algorithms. The Experiments library can help you determine how well Seldonian algorithms perform, how much data they need for your application and safety/fairness constraints. The Experiments library also enables you to compare the Seldonian algorithm implemented in this toolkit to other machine learning methods or libraries.
-
... is a tool for beginners. It comes with a graphical user interface (GUI) that lets you easily specify constraints. In future releases, this GUI will allow you to build, train, and evaluate the safety and fairness of machine learning systems without any programming. For those learning to program, future versions of this GUI will show you the Python code for everything it does, providing an easy and hands-on way to learn programming for data science and machine learning.
-
... supports parametric machine learning. The current version of the toolkit does not support "parameter-free" or "non parametric" models such as random forest or support vector machines. However, the Experiments repository still makes it possible to compare your Seldonian algorithms to these types of models.