
Tutorials
This folder contains DIANNA tutorial notebooks. To install the dependencies for the tutorials, run (in the main dianna folder)
π For general demonstration of DIANNA click on the logo
or run it in Colab:
.
π For tutorials on how to convert an Keras, PyTorch, Scikit-learn or Tensorflow model to ONNX, please see the conversion tutorials.
π For specific XAI methods (explainers):
- Click on the explainer names to watch explanatory videos for the respective method.
- Click on the logos for direct access to a tutorial notebook. Run the tutorials directly in Google Colab by clicking on the Colab buttons.
Datasets and Tasks
Illustrative (Simple)
|*Data modality*|Dataset| *Task* |Logo|
|:------------|:------|:----------------------------------------------------------------------|:----|
|*Images*|Binary MNIST | Binary digit *classification* |
|
||[Simple Geometric (circles and triangles)](https://doi.org/10.5281/zenodo.5012824)| Binary shape *classificaiton* |
|
||[Imagenet](https://image-net.org/download.php) | $1000$ classes natural images *classificaiton* |
|
|*Text*| [Stanford sentiment treebank](https://nlp.stanford.edu/sentiment/index.html) | Positive or negative movie reviews sentiment *classification* |
|
|*Timeseries* | [Coffee dataset](https://www.timeseriesclassification.com/description.php?Dataset=Coffee) | Binary *classificaiton* of Robusta and Aribica coffee beans |
|
| | [Weather dataset](https://zenodo.org/record/7525955) | Binary *classification* (warm/cold season) of temperature time-series |
|
|*Tabular*| [Penguin dataset](https://www.kaggle.com/code/parulpandey/penguin-dataset-the-new-iris)| $3$ penguin spicies (Adele, Chinstrap, Gentoo) *classificaiton* |
| |
| | [Weather dataset](https://zenodo.org/record/7525955) | Next day sunshine hours prediction (*regression*) |
|
Scientific use-cases
|*Data modality*|Dataset|*Task*|Logo|
|:------------|:------|:---|:----|
|*Images*|[Simple Scientific (LeafSnap30)](https://zenodo.org/record/5061353/)| $30$ tree species leaves *classification* |
|
|*Text*| [EU-law statements](https://zenodo.org/records/8200000) | Regulatory or non-regulatory *classification* |
|
|*Timeseries* | Fast Radio Burst (FRB) dataset (not publicly available) | Binary *classificaiton* of Fast Radio Burst (FRB) timeseries data : noise or a real FRB. |
|
|*Tabular*| [Land atmosphere dataset](https://zenodo.org/records/12623257)| Prediction of "latent heat flux" (*regression*). The random forest model is used as an [emulator](https://github.com/EcoExtreML/Emulator) to replace the physical model [STEMMUS_SCOPE](https://github.com/EcoExtreML/STEMMUS_SCOPE) to predict global maps of latent heat flux. |
|
Models
The ONNX models used in the tutorials are available at dianna/models, or linked from their respective tutorial notebooks.
Summary of all Tutorials
All tutorials can be accessed by clicking on the dataset & task logo in the tables below.
The explainersβ output for the models trained on the datasets & tasks which are included in the dashboard are marked with
.
Illustrative (Simple)
|*Modality* \ Method|RISE|[LIME](https://youtu.be/d6j6bofhj2M)|Kernel[SHAP](https://youtu.be/9haIOplEIGM)|
|:-----|:---|:---|:---|
|*Images*|[
](/dianna/tutorials/explainers/RISE/rise_mnist.ipynb) or [](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/RISE/rise_mnist.ipynb)
|
| [
](/dianna/tutorials/explainers/KernelSHAP/kernelshap_mnist.ipynb) or [](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/KernelSHAP/kernelshap_mnist.ipynb)
|
| | [
](/dianna/tutorials/explainers/RISE/rise_imagenet.ipynb) or [](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/RISE/rise_imagenet.ipynb) | | [
](/dianna/tutorials/explainers/KernelSHAP/kernelshap_geometric_shapes.ipynb) or [](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/KernelSHAP/kernelshap_geometric_shapes.ipynb)|
|*Text* |[
](/dianna/tutorials/explainers/RISE/rise_text.ipynb) or [](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/RISE/rise_text.ipynb)
|[
](/dianna/tutorials/explainers/LIME/lime_text.ipynb) or [](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/LIME/lime_text.ipynb)
|[]()|
| *Time series*| [
](/dianna/tutorials/explainers/RISE/rise_timeseries_weather.ipynb) or [](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/RISE/rise_timeseries_weather.ipynb)
| [
](/dianna/tutorials/explainers/LIME/lime_timeseries_weather.ipynb) or [](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/LIME/lime_timeseries_weather.ipynb)
| |
| | | [
](/dianna/tutorials/explainers/LIME/lime_timeseries_coffee.ipynb) or [](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/LIME/lime_timeseries_coffee.ipynb) | |
| *Tabular* | [
](/dianna/tutorials/explainers/RISE/rise_tabular_penguin.ipynb) or [](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/RISE/rise_tabular_penguin.ipynb)
| [
](/dianna/tutorials/explainers/LIME/lime_tabular_penguin.ipynb) or [](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/LIME/lime_tabular_penguin.ipynb)
|[
](/dianna/tutorials/explainers/KernelSHAP/kernelshap_tabular_penguin.ipynb) or [](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/KernelSHAP/kernelshap_tabular_penguin.ipynb)
|
| |
| [
](/dianna/tutorials/explainers/LIME/lime_tabular_weather.ipynb) or [](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/KernelSHAP/kernelshap_tabular_weather.ipynb)
|[
](/dianna/tutorials/explainers/KernelSHAP/kernelshap_tabular_weather.ipynb) or [](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/KernelSHAP/kernelshap_tabular_weather.ipynb)
|
To learn more about how we aproach the masking for time-series data, please read our [Masking time-series for XAI](https://blog.esciencecenter.nl/masking-time-series-for-explainable-ai-90247ac252b4) blog-post.
Scientific use-cases
| *Modality* \ Method |RISE| [LIME](https://youtu.be/d6j6bofhj2M) |Kernel[SHAP](https://youtu.be/9haIOplEIGM)|
|:--------------------|:---|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---|
| *Images* | | [
](/dianna/tutorials/explainers/LIME/lime_images.ipynb) or [](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/LIME/lime_images.ipynb) ||
| *Text* | | [
](/dianna/tutorials/explainers/LIME/lime_text_eulaw.ipynb) or [](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/LIME/lime_text_eulaw.ipynb)
| |
| *Time series* | [
](/dianna/tutorials/explainers/RISE/rise_timeseries_frb.ipynb) or [](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/RISE/rise_timeseries_frb.ipynb)
| |
| *Tabular* | | |[
](/dianna/tutorials/explainers/KernelSHAP/kernelshap_tabular_land_atmosphere.ipynb) or [](https://colab.research.google.com/github/dianna-ai/dianna/blob/main/tutorials/explainers/KernelSHAP/kernelshap_tabular_land_atmosphere.ipynb)|
IMPORTANT: Hyperparameters
Settings per explainer
The XAI methods (explainers) are sensitive to the choice of their hyperparameters! In this [master Thesis](https://staff.fnwi.uva.nl/a.s.z.belloum/MSctheses/MScthesis_Willem_van_der_Spec.pdf), this sensitivity is researched and useful conclusions are drawn.
The default hyperparameters used in DIANNA for each explainer as well as the choices for some tutorials and their data modality (*i* - images, *txt* - text, *ts* - time series and *tab* - tabular) are given in the tables below.
Also the main conclusions (π ) from the thesis (on images and text) about the hyperparameters effect are listed.
#### RISE
| Hyperparameter | Default value |
(*i*)|
(*i*) |
(*txt*) |
(*ts*)|
(*ts*)|
| ------------- | ------------- | -------------------|-----------------------------| ---------------------------------|---------------------------------|---------------------------------|
| $n_{masks}$ |**$1000$** | default | $5000$ | default | $10000$ |$5000$ |
| $p_{keep}$ | **optimized** (*i*, *txt*), **$0.5$** (*ts*) | $0.1$| $0.1$ | default | $0.1$| $0.1$|
| $n_{features}$ |**$8$** | $6$ |default | default | default | $16$ |
π The most crucial parameter is $p_{keep}$. Lower values of $p_{keep}$ lead to more sentitive explanations (observed for both images and text). Easier classificication tasks usually require a lower $p_keep$ as this will cause more perturbation in the input and therefore a more distinct signal in the model predictions.
π The feature resolution $n_{features}$ exhibited an optimum at a value of $6$. Higher values can offer a finer grained result but require (far) more $n_masks$. This is also dependent on the scale of the phenomena in the input data that we want to take into account in the explanation.
π Larger $n_masks$ will return more consistent results at the cost of computation time. If 2 identical runs yield (very) different results, these will likely contain a lot of (or even mostly) noise and a higher value for $n_masks$ should be used instead.
#### LIME
| Hyperparameter | Default value |
(*i*) |
(*ts*)|
(*ts*)| [
](/dianna/tutorials/explainers/LIME/lime_text_eulaw.ipynb) |
| ------------- | ------------- |--------| -----| -----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| $n_{samples}$ | **$5000$** | $1000$ | $10 000$| $500$| 2000 |
| *Kernel Width* | **$25$**| default | default| default| default |
| $n_{features}$ | **$10$** | $30$ | default| default| 999 |
π The most crucial parameter is the *Kernel width*: low values cause high sensitivity, however that observation was dependent on the evaluation metric.
#### KernelSHAP
| Hyperparameter | Default value |
(*i*)|
(*i*) |
(*tab*) |
| ------------- | ------------- |------------- |------------- | ------------- |
| $n_{samples}$ | **auto/int** | $1000$| $2000$ | $136588$|
| $n_{segments}$ | **$100$** |$200$ |$200$ |default |
| $sigma$ | **$0$** | default | default | default|
π The most crucial parameter is the nubmer of super-pixels $n_{segments}$. Higher values led to higher sensitivity, however that observaiton was dependant on the evaluaiton metric.
π Regularization had only a marginal detrimental effect, the best results were obtained using no regularization (no smoothing, $sigma = 0$) or least squares regression.