understanding black box predictions via influence functions

The deep bootstrap framework: Good online learners are good offline generalizers. Check if you have access through your login credentials or your institution to get full access on this article. On the accuracy of influence functions for measuring group effects. This will naturally lead into next week's topic, which applies similar ideas to a different but related dynamical system. We have two ways of measuring influence: Our first option is to delete the instance from the training data, retrain the model on the reduced training dataset and observe the difference in the model parameters or predictions (either individually or over the complete dataset). To get the correct test outcome of ship, the Helpful images from CodaLab Worksheets PDF Understanding Black-box Predictions via Influence Functions We show that even on non-convex and non-differentiable models where the theory breaks down, approximations to influence functions can still provide valuable information. test images, the helpfulness is ordered by average helpfulness to the % To scale up influence functions to modern machine learning We have 3 hours scheduled for lecture and/or tutorial. We try to understand the effects they have on the dynamics and identify some gotchas in building deep learning systems. Understanding Black-box Predictions via Influence Functions ICML2017 3 (influence function) 4 International conference on machine learning, 1885-1894, 2017. Which algorithmic choices matter at which batch sizes? In. Self-tuning networks: Bilevel optimization of hyperparameters using structured best-response functions. Github Wojnowicz, M., Cruz, B., Zhao, X., Wallace, B., Wolff, M., Luan, J., and Crable, C. "Influence sketching": Finding influential samples in large-scale regressions. How can we explain the predictions of a black-box model? Chatterjee, S. and Hadi, A. S. Influential observations, high leverage points, and outliers in linear regression. ICML 2017 Best Paper - Existing influence functions tackle this problem by using first-order approximations of the effect of removing a sample from the training set on model . Systems often become easier to analyze in the limit. Google Scholar Digital Library; Josua Krause, Adam Perer, and Kenney Ng. In this paper, we use influence functions a classic technique from robust statistics to trace a models prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. Springenberg, J. T., Dosovitskiy, A., Brox, T., and Riedmiller, M. Striving for simplicity: The all convolutional net. In. Theano: A Python framework for fast computation of mathematical expressions. Apparently this worked. Loss , . In this paper, we use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through . J. Cohen, S. Kaur, Y. Li, J. Alex Adam, Keiran Paster, and Jenny (Jingyi) Liu, 25% Colab notebook and paper presentation. A tag already exists with the provided branch name. In this paper, we use influence functions a classic technique from robust statistics to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. For more details please see For these All Holdings within the ACM Digital Library. PW Koh*, KS Ang*, H Teo*, PS Liang. as long as you have a supervised learning problem. Understanding Black-box Predictions via Influence Functions You signed in with another tab or window. Visualised, the output can look like this: The test image on the top left is test image for which the influences were Understanding short-horizon bias in stochastic meta-optimization. Kelvin Wong, Siva Manivasagam, and Amanjit Singh Kainth. This is a PyTorch reimplementation of Influence Functions from the ICML2017 best paper: Understanding Black-box Predictions via Influence Functions by Pang Wei Koh and Percy Liang. https://dl.acm.org/doi/10.5555/3305381.3305576. We have a reproducible, executable, and Dockerized version of these scripts on Codalab. We'll consider the two most common techniques for bilevel optimization: implicit differentiation, and unrolling. most harmful. Training test 7, Training 1, test 7 . 2172: 2017: . Y. LeCun, L. Bottou, G. B. Orr, and K.-R. Muller. we develop a simple, efficient implementation that requires only oracle access to gradients ICML 2017 best paperStanfordPang Wei KohCourseraStanfordNIPS 2019influence functionPercy Liang11Michael Jordan, , \hat{\theta}_{\epsilon, z} \stackrel{\text { def }}{=} \arg \min _{\theta \in \Theta} \frac{1}{n} \sum_{i=1}^{n} L\left(z_{i}, \theta\right)+\epsilon L(z, \theta), \left.\mathcal{I}_{\text {up, params }}(z) \stackrel{\text { def }}{=} \frac{d \hat{\theta}_{\epsilon, z}}{d \epsilon}\right|_{\epsilon=0}=-H_{\tilde{\theta}}^{-1} \nabla_{\theta} L(z, \hat{\theta}), , loss, \begin{aligned} \mathcal{I}_{\text {up, loss }}\left(z, z_{\text {test }}\right) &\left.\stackrel{\text { def }}{=} \frac{d L\left(z_{\text {test }}, \hat{\theta}_{\epsilon, z}\right)}{d \epsilon}\right|_{\epsilon=0} \\ &=\left.\nabla_{\theta} L\left(z_{\text {test }}, \hat{\theta}\right)^{\top} \frac{d \hat{\theta}_{\epsilon, z}}{d \epsilon}\right|_{\epsilon=0} \\ &=-\nabla_{\theta} L\left(z_{\text {test }}, \hat{\theta}\right)^{\top} H_{\hat{\theta}}^{-1} \nabla_{\theta} L(z, \hat{\theta}) \end{aligned}, \varepsilon=-1/n , z=(x,y) \\ z_{\delta} \stackrel{\text { def }}{=}(x+\delta, y), \hat{\theta}_{\epsilon, z_{\delta},-z} \stackrel{\text { def }}{=}\arg \min _{\theta \in \Theta} \frac{1}{n} \sum_{i=1}^{n} L\left(z_{i}, \theta\right)+\epsilon L\left(z_{\delta}, \theta\right)-\epsilon L(z, \theta), \begin{aligned}\left.\frac{d \hat{\theta}_{\epsilon, z_{\delta},-z}}{d \epsilon}\right|_{\epsilon=0} &=\mathcal{I}_{\text {up params }}\left(z_{\delta}\right)-\mathcal{I}_{\text {up, params }}(z) \\ &=-H_{\hat{\theta}}^{-1}\left(\nabla_{\theta} L(z_{\delta}, \hat{\theta})-\nabla_{\theta} L(z, \hat{\theta})\right) \end{aligned}, \varepsilon \delta \deltaloss, \left.\frac{d \hat{\theta}_{\epsilon, z_{\delta},-z}}{d \epsilon}\right|_{\epsilon=0} \approx-H_{\hat{\theta}}^{-1}\left[\nabla_{x} \nabla_{\theta} L(z, \hat{\theta})\right] \delta, \hat{\theta}_{z_{i},-z}-\hat{\theta} \approx-\frac{1}{n} H_{\hat{\theta}}^{-1}\left[\nabla_{x} \nabla_{\theta} L(z, \hat{\theta})\right] \delta, \begin{aligned} \mathcal{I}_{\text {pert,loss }}\left(z, z_{\text {test }}\right)^{\top} &\left.\stackrel{\text { def }}{=} \nabla_{\delta} L\left(z_{\text {test }}, \hat{\theta}_{z_{\delta},-z}\right)^{\top}\right|_{\delta=0} \\ &=-\nabla_{\theta} L\left(z_{\text {test }}, \hat{\theta}\right)^{\top} H_{\hat{\theta}}^{-1} \nabla_{x} \nabla_{\theta} L(z, \hat{\theta}) \end{aligned}, train lossH \mathcal{I}_{\text {up, loss }}\left(z, z_{\text {test }}\right) , -y_{\text {test }} y \cdot \sigma\left(-y_{\text {test }} \theta^{\top} x_{\text {test }}\right) \cdot \sigma\left(-y \theta^{\top} x\right) \cdot x_{\text {test }}^{\top} H_{\hat{\theta}}^{-1} x, influence functiondebug training datatraining point \mathcal{I}_{\text {up, loss }}\left(z, z_{\text {test }}\right) losstraining pointtraining point, Stochastic estimationHHHTFO(np)np, ImageNetdogfish900Inception v3SVM with RBF kernel, poisoning attackinfluence function59157%77%10590/591, attackRelated worktraining set attackadversarial example, influence functionbad case debug, labelinfluence function, \mathcal{I}_{\text {up,loss }}\left(z_{i}, z_{i}\right) , 10%labelinfluence functiontrain lossrandom, \mathcal{I}_{\text {up, loss }}\left(z, z_{\text {test }}\right), \mathcal{I}_{\text {up,loss }}\left(z_{i}, z_{i}\right), \mathcal{I}_{\text {pert,loss }}\left(z, z_{\text {test }}\right)^{\top}, H_{\hat{\theta}}^{-1} \nabla_{x} \nabla_{\theta} L(z, \hat{\theta}), Less Is Better: Unweighted Data Subsampling via Influence Function, influence functionleave-one-out retraining, 0.86H, SVMhinge loss0.95, straightforwardbest paper, influence functionloss. How can we explain the predictions of a black-box model? Understanding Black-box Predictions via Influence Functions - PMLR Kingma, D. and Ba, J. Adam: A method for stochastic optimization. lehman2019inferringE. Haoping Xu, Zhihuan Yu, and Jingcheng Niu. Terry Taewoong Um (terry.t.um@gmail.com) University of Waterloo Department of Electrical & Computer Engineering Terry T. Um UNDERSTANDING BLACK-BOX PRED -ICTION VIA INFLUENCE FUNCTIONS 1 Neural tangent kernel: Convergence and generalization in neural networks. In. Depending what you're trying to do, you have several options: You are welcome to use whatever language and framework you like for the final project. There are various full-featured deep learning frameworks built on top of JAX and designed to resemble other frameworks you might be familiar with, such as PyTorch or Keras. where the theory breaks down, Understanding black-box predictions via influence functions. Influence functions can of course also be used for data other than images, On the importance of initialization and momentum in deep learning. Influence functions efficiently estimate the effect of removing a single training data point on a model's learned parameters. Stochastic Optimization and Scaling [Slides]. In this paper, we use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. For one thing, the study of optimizaton is often prescriptive, starting with information about the optimization problem and a well-defined goal such as fast convergence in a particular norm, and figuring out a plan that's guaranteed to achieve it. With the rapid adoption of machine learning systems in sensitive applications, there is an increasing need to make black-box models explainable. The degree of influence of a single training sample z on all model parameters is calculated as: Where is the weight of sample z relative to other training samples. on the final predictions is straight forward. The Understanding Black-box Predictions via Influence Functions. . ordered by harmfulness. Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., and Vaughan, J. W. A theory of learning from different domains. In. We show that even on non-convex and non-differentiable models where the theory breaks down, approximations to influence functions can still provide valuable information. J. Lucas, S. Sun, R. Zemel, and R. Grosse. Your search export query has expired. The dict structure looks similiar to this: Harmful is a list of numbers, which are the IDs of the training data samples initial value of the Hessian during the s_test calculation, this is Simonyan, K., Vedaldi, A., and Zisserman, A. The main choices are. If there are n samples, it can be interpreted as 1/n. Neural nets have achieved amazing results over the past decade in domains as broad as vision, speech, language understanding, medicine, robotics, and game playing. Model-agnostic meta-learning for fast adaptation of deep networks. On linear models and convolutional neural networks, we demonstrate that influence functions are useful for multiple purposes: understanding model behavior, debugging models, detecting dataset errors, and even creating visually-indistinguishable training-set attacks.
Hamilton High School Marching Band, Articles U