dianna

Logo_ER10

Tutorials

This folder contains DIANNA tutorial notebooks. To install the dependencies for the tutorials, run (in the main dianna folder)

pip install .[notebooks]

🠊 For general demonstration of DIANNA click on the logo Logo_ER10 or run it in Colab: Open In Colab.

🠊 For tutorials on how to convert an Keras, PyTorch, Scikit-learn or Tensorflow model to ONNX, please see the conversion tutorials.

🠊 For specific XAI methods (explainers):

Datasets and Tasks

Illustrative (Simple) |*Data modality*|Dataset| *Task* |Logo| |:------------|:------|:----------------------------------------------------------------------|:----| |*Images*|Binary MNIST | Binary digit *classification* | mnist_zero_and_one_half_size| ||[Simple Geometric (circles and triangles)](https://doi.org/10.5281/zenodo.5012824)| Binary shape *classificaiton* |SimpleGeometric Logo| ||[Imagenet](https://image-net.org/download.php) | $1000$ classes natural images *classificaiton* | ImageNet_autocrop| |*Text*| [Stanford sentiment treebank](https://nlp.stanford.edu/sentiment/index.html) | Positive or negative movie reviews sentiment *classification* | nlp-logo_half_size| |*Timeseries* | [Coffee dataset](https://www.timeseriesclassification.com/description.php?Dataset=Coffee) | Binary *classificaiton* of Robusta and Aribica coffee beans | Coffe Logo| | | [Weather dataset](https://zenodo.org/record/7525955) | Binary *classification* (warm/cold season) of temperature time-series |Weather Logo| |*Tabular*| [Penguin dataset](https://www.kaggle.com/code/parulpandey/penguin-dataset-the-new-iris)| $3$ penguin spicies (Adele, Chinstrap, Gentoo) *classificaiton* | Penguin Logo | | | | [Weather dataset](https://zenodo.org/record/7525955) | Next day sunshine hours prediction (*regression*) | Weather Logo|
Scientific use-cases |*Data modality*|Dataset|*Task*|Logo| |:------------|:------|:---|:----| |*Images*|[Simple Scientific (LeafSnap30)](https://zenodo.org/record/5061353/)| $30$ tree species leaves *classification* | LeafSnap30 Logo | |*Text*| [EU-law statements](https://zenodo.org/records/8200000) | Regulatory or non-regulatory *classification* | nlp-logo_half_size| |*Timeseries* | Fast Radio Burst (FRB) dataset (not publicly available) | Binary *classificaiton* of Fast Radio Burst (FRB) timeseries data : noise or a real FRB. | FRB logo| |*Tabular*| [Land atmosphere dataset](https://zenodo.org/records/12623257)| Prediction of "latent heat flux" (*regression*). The random forest model is used as an [emulator](https://github.com/EcoExtreML/Emulator) to replace the physical model [STEMMUS_SCOPE](https://github.com/EcoExtreML/STEMMUS_SCOPE) to predict global maps of latent heat flux. | Atmosphere Logo |

Models

The ONNX models used in the tutorials are available at dianna/models, or linked from their respective tutorial notebooks.

Summary of all Tutorials

All tutorials can be accessed by clicking on the dataset & task logo in the tables below.

The explainers’ output for the models trained on the datasets & tasks which are included in the dashboard are marked with Streamlit Logo.

Illustrative (Simple) |*Modality* \ Method|RISE|[LIME](https://youtu.be/d6j6bofhj2M)|Kernel[SHAP](https://youtu.be/9haIOplEIGM)| |:-----|:---|:---|:---| |*Images*|[mnist_zero_and_one_half_size](/dianna/tutorials/explainers/RISE/rise_mnist.ipynb) or [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/RISE/rise_mnist.ipynb) Streamlit Logo|Streamlit Logo | [mnist_zero_and_one_half_size](/dianna/tutorials/explainers/KernelSHAP/kernelshap_mnist.ipynb) or [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/KernelSHAP/kernelshap_mnist.ipynb) Streamlit Logo | | | [ImageNet_autocrop](/dianna/tutorials/explainers/RISE/rise_imagenet.ipynb) or [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/RISE/rise_imagenet.ipynb) | | [SimpleGeometric Logo](/dianna/tutorials/explainers/KernelSHAP/kernelshap_geometric_shapes.ipynb) or [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/KernelSHAP/kernelshap_geometric_shapes.ipynb)| |*Text* |[nlp-logo_half_size](/dianna/tutorials/explainers/RISE/rise_text.ipynb) or [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/RISE/rise_text.ipynb) Streamlit Logo |[nlp-logo_half_size](/dianna/tutorials/explainers/LIME/lime_text.ipynb) or [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/LIME/lime_text.ipynb) Streamlit Logo |[]()| | *Time series*| [Weather Logo](/dianna/tutorials/explainers/RISE/rise_timeseries_weather.ipynb) or [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/RISE/rise_timeseries_weather.ipynb) Streamlit Logo| [Weather Logo](/dianna/tutorials/explainers/LIME/lime_timeseries_weather.ipynb) or [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/LIME/lime_timeseries_weather.ipynb) Streamlit Logo| | | | | [Coffee Logo](/dianna/tutorials/explainers/LIME/lime_timeseries_coffee.ipynb) or [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/LIME/lime_timeseries_coffee.ipynb) | | | *Tabular* | [Penguin Logo](/dianna/tutorials/explainers/RISE/rise_tabular_penguin.ipynb) or [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/RISE/rise_tabular_penguin.ipynb) Streamlit Logo| [Penguin Logo](/dianna/tutorials/explainers/LIME/lime_tabular_penguin.ipynb) or [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/LIME/lime_tabular_penguin.ipynb) Streamlit Logo |[Penguin Logo](/dianna/tutorials/explainers/KernelSHAP/kernelshap_tabular_penguin.ipynb) or [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/KernelSHAP/kernelshap_tabular_penguin.ipynb) Streamlit Logo| | |Streamlit Logo | [Weather Logo](/dianna/tutorials/explainers/LIME/lime_tabular_weather.ipynb) or [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/KernelSHAP/kernelshap_tabular_weather.ipynb) Streamlit Logo|[Weather Logo](/dianna/tutorials/explainers/KernelSHAP/kernelshap_tabular_weather.ipynb) or [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/KernelSHAP/kernelshap_tabular_weather.ipynb) Streamlit Logo| To learn more about how we aproach the masking for time-series data, please read our [Masking time-series for XAI](https://blog.esciencecenter.nl/masking-time-series-for-explainable-ai-90247ac252b4) blog-post.
Scientific use-cases | *Modality* \ Method |RISE| [LIME](https://youtu.be/d6j6bofhj2M) |Kernel[SHAP](https://youtu.be/9haIOplEIGM)| |:--------------------|:---|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---| | *Images* | | [LeafSnap30 Logo](/dianna/tutorials/explainers/LIME/lime_images.ipynb) or [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/LIME/lime_images.ipynb) || | *Text* | | [nlp-logo_half_size](/dianna/tutorials/explainers/LIME/lime_text_eulaw.ipynb) or [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/LIME/lime_text_eulaw.ipynb) Streamlit Logo| | | *Time series* | [FRB logo](/dianna/tutorials/explainers/RISE/rise_timeseries_frb.ipynb) or [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/RISE/rise_timeseries_frb.ipynb) Streamlit Logo | | | *Tabular* | | |[Atmosphere Logo](/dianna/tutorials/explainers/KernelSHAP/kernelshap_tabular_land_atmosphere.ipynb) or [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/KernelSHAP/kernelshap_tabular_land_atmosphere.ipynb)|

IMPORTANT: Hyperparameters

Settings per explainer The XAI methods (explainers) are sensitive to the choice of their hyperparameters! In this [master Thesis](https://staff.fnwi.uva.nl/a.s.z.belloum/MSctheses/MScthesis_Willem_van_der_Spec.pdf), this sensitivity is researched and useful conclusions are drawn. The default hyperparameters used in DIANNA for each explainer as well as the choices for some tutorials and their data modality (*i* - images, *txt* - text, *ts* - time series and *tab* - tabular) are given in the tables below. Also the main conclusions (🠊) from the thesis (on images and text) about the hyperparameters effect are listed. #### RISE | Hyperparameter | Default value | ImageNet_autocrop (*i*)| mnist_zero_and_one_half_size(*i*) | nlp-logo_half_size (*txt*) | Weather Logo (*ts*)| FRB logo (*ts*)| | ------------- | ------------- | -------------------|-----------------------------| ---------------------------------|---------------------------------|---------------------------------| | $n_{masks}$ |**$1000$** | default | $5000$ | default | $10000$ |$5000$ | | $p_{keep}$ | **optimized** (*i*, *txt*), **$0.5$** (*ts*) | $0.1$| $0.1$ | default | $0.1$| $0.1$| | $n_{features}$ |**$8$** | $6$ |default | default | default | $16$ | 🠊 The most crucial parameter is $p_{keep}$. Lower values of $p_{keep}$ lead to more sentitive explanations (observed for both images and text). Easier classificication tasks usually require a lower $p_keep$ as this will cause more perturbation in the input and therefore a more distinct signal in the model predictions. 🠊 The feature resolution $n_{features}$ exhibited an optimum at a value of $6$. Higher values can offer a finer grained result but require (far) more $n_masks$. This is also dependent on the scale of the phenomena in the input data that we want to take into account in the explanation. 🠊 Larger $n_masks$ will return more consistent results at the cost of computation time. If 2 identical runs yield (very) different results, these will likely contain a lot of (or even mostly) noise and a higher value for $n_masks$ should be used instead. #### LIME | Hyperparameter | Default value | LeafSnap30 Logo (*i*) |Weather Logo (*ts*)| Coffe Logo(*ts*)| [nlp-logo_half_size](/dianna/tutorials/explainers/LIME/lime_text_eulaw.ipynb) | | ------------- | ------------- |--------| -----| -----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | $n_{samples}$ | **$5000$** | $1000$ | $10 000$| $500$| 2000 | | *Kernel Width* | **$25$**| default | default| default| default | | $n_{features}$ | **$10$** | $30$ | default| default| 999 | 🠊 The most crucial parameter is the *Kernel width*: low values cause high sensitivity, however that observation was dependent on the evaluation metric. #### KernelSHAP | Hyperparameter | Default value | mnist_zero_and_one_half_size (*i*)| SimpleGeometric Logo (*i*) | Atmosphere Logo (*tab*) | | ------------- | ------------- |------------- |------------- | ------------- | | $n_{samples}$ | **auto/int** | $1000$| $2000$ | $136588$| | $n_{segments}$ | **$100$** |$200$ |$200$ |default | | $sigma$ | **$0$** | default | default | default| 🠊 The most crucial parameter is the nubmer of super-pixels $n_{segments}$. Higher values led to higher sensitivity, however that observaiton was dependant on the evaluaiton metric. 🠊 Regularization had only a marginal detrimental effect, the best results were obtained using no regularization (no smoothing, $sigma = 0$) or least squares regression.