📖 Review the Lecture notes

Working with multispectral satellite data

This lab is intended to give anyone who hasn’t worked with multispectral satellite data (in python) before some hands-on experience in doing so. The application (separate lab section) is the classification of water (though other classes, such as vegetation, are very similar) from images over the Alaskan and Canadian Arctic. The examples are from our paper on super-resolution fully convolutional neural networks (FCN).

The notebook covers how to import satellite images with rasterio, which is a python wrapper for gdal, a popular library for working with geospatial data. You can see how to visualize the data stored in an xarray with matplotlib, similar to earlier labs. Next, we turn the data into a regular numpy array, which you can manipulate with the python commands that you learned throughout the semester. Specifically, we perform some basic algebraic operations to compute the (Modified) Normalized Water Index (MNDWI) as a traditional way to classify water. The classification from NDWI is compared to one from the FCN in our paper and we compute various accuracy metrics between those two classification predictions.

We will try to do the full CNN implementation and training in a separate notebook.

There are not many specific lab questions; rather you can tinker with the codes to see if you understand every step and change any number of parameters. For example, instead of plotting a RGB image, you can explore different band combinations to see if water (or other features) stand out more in, e.g., green, blue, and near infrared. Instead of NDWI, you can plot the Normalized Different Vegetation Index (NDVI) and see what that looks like. You are asked to play around with the NDWI threshold to see how easy or hard it is to separate out the water with this approach. And you can do all of this with 8 different satellite images (you have to do ‘Restart and Run all’ when switching between images).

Enjoy.

Install python packages for geospatial data

To work with geospatial data, i.e. satellite imagery, we need to actually install some python packages that aren’t installed by default in Google Colab. pip, or ‘package installer for python’ is one way to do this. Another common one is (ana)conda. We can run pip directly from a jupyter notebook by preceding the usual linux command by an exclamation mark.

Rasterio and xarray are python packages to work with rasterized geo-referenced data.

Code

!pip install rasterio
!pip install rioxarray

Load other packages

Some other packages, not all of which may be strictly necessary for this notebook. The last couple of packages allows us to directly import images from URLs, rather than working with locally stored files.

Code

%matplotlib inline
import os
import numpy as np
import cv2
import rasterio as rio
import xarray as xr
import rioxarray
from osgeo import gdal

# for TOA correction:
import pandas as pd
from lxml import etree
import datetime

import tifffile as tiff

import matplotlib.pyplot as plt
plt.rcParams.update({'figure.figsize': (10, 6)})

# Importing images:
from io import BytesIO
import requests
from requests import get
from PIL import Image

Download satellite image(s)

I selected a few Sentinel-2 images over rivers in the Alaskan and Canadian Arctic and hosted those on my own Google Drive. Let’s see if we can import these into this notebook.

For anyone that may want to do something similar in the future. There is a trick to edit the ‘share’ link that google provides into a direct link to the file that we can use to download it from a python code. If you have a file in your Google Drive, click Share and select ‘anyone with link can view’, google will give you a link that looks like this:

https://drive.google.com/file/d/YOUR_ID/view?usp=sharing

to open this link from a python script, like this one, we need to edit this to:

https://drive.google.com/uc?export=download&id=YOUR_ID

For example:

Code

# Note, for some reason that I have not yet been able to debug,
# if you want to select a different image you should do 'Restart and Rerun All'
# from the Runtime menu in the top.
choose_image = 1

if choose_image == 1: # yukon
  url = 'https://drive.google.com/uc?export=download&id=1fnk58WulRWKt5HJ-lbfG60y4d-Vo3RSN'
  url_mask = 'https://drive.google.com/uc?export=download&id=1dUV-Q0DehUlyG4D9_vxpml9eAixrjKvn'
  img_name = 'image1.tif'
  mask_name = 'mask1.tif'

elif choose_image == 2: #gage7
  url = 'https://drive.google.com/uc?export=download&id=1d3YQCy3PJHZ2ZULaGDY5-eMb5-BZAHVL'
  url_mask = 'https://drive.google.com/uc?export=download&id=1cKFPzQtWwW1UKgUb2rhnnj38jFetUayZ'
  img_name = 'image2.tif'
  mask_name = 'mask2.tif'

elif choose_image == 3: #gage2
  url = 'https://drive.google.com/uc?export=download&id=179PPA1ZOblysZqssXhNkDH68CuvjQ3S2'
  url_mask = 'https://drive.google.com/uc?export=download&id=1GyN7QE8_EJ2eOwqIkcMyG3V5D8F7RSIN'
  img_name = 'image3.tif'
  mask_name = 'mask3.tif'

elif choose_image == 4: # tanana
  url = 'https://drive.google.com/uc?export=download&id=1ZIN7DZOer2hJCVPEwHOBwokEWnII1wLN'
  url_mask = 'https://drive.google.com/uc?export=download&id=1P8zrL4bTqRg9xfA2wwSgx8MaqqDSvPxl'
  img_name = 'image4.tif'
  mask_name = 'mask4.tif'

elif choose_image == 5: # nenana
  url = 'https://drive.google.com/uc?export=download&id=1Xa1ALarTTh6tG4uEQfKdaetE7WkEO1FP'
  url_mask = 'https://drive.google.com/uc?export=download&id=1gRIHOEW4aztRfCQ81wYM9zFR5f7O-fyf'
  img_name = 'image5.tif'
  mask_name = 'mask5.tif'

elif choose_image == 6: # gage1
  url = 'https://drive.google.com/uc?export=download&id=1pRNrkEYQyafq25Rwdic6kN4eVE--VHCV'
  url_mask = 'https://drive.google.com/uc?export=download&id=1WyQH-jRfEPOcYXKvZoiD3L4nCtIZdf-Z'
  img_name = 'image6.tif'
  mask_name = 'mask6.tif'

elif choose_image == 7: # gage5
  url = 'https://drive.google.com/uc?export=download&id=1jOnk9TXbz_0Rn-pr7TIwNGjtC8CpLXLt'
  url_mask = 'https://drive.google.com/uc?export=download&id=1cd4hqoWORX3cBeOlTbBlTrrzhoElDp3a'
  img_name = 'image7.tif'
  mask_name = 'mask7.tif'

elif choose_image == 8: # gage4
  url = 'https://drive.google.com/uc?export=download&id=1Ra3FciJN3D2zMgK1QUNilp5kOayG898I'
  url_mask = 'https://drive.google.com/uc?export=download&id=1KT1EoCZsV_nmzVkjtGEfz8p_Mu-zf_Q7'
  img_name = 'image8.tif'
  mask_name = 'mask8.tif'

Next, we create an easy-to-use function to download a file from a URL into the jupyter notebook, and give this file a name as if we had it locally stored. Here is the function:

Code

def download_file(url, file_name):
    with open(file_name, "wb") as file:
        response = get(url)
        file.write(response.content)

And here is how easy it now is to download a satellite image from a URL link:

Code

download_file(url, img_name)

Open satellite image

The file we downloaded is not a regular ‘tif’ image, but a geotiff. It includes the intensities (reflectance) of each pixel for each color, as for a regular image, but allows for any number of colors (multi- or hyper-spectral data) and also includes a bunch of meta-data, such as the coordinates of each pixel, the coordinate reference system (CRS), the physical size of each pixel at the surface (e.g., 10 m by 10 m), and more.

For this reason, we cannot open this file with a regular python library for images, but use rasterio or rioxarray, which will turn the data into a rasterized image/array that we can work with. Rasterio is a python wrapper for gdal, which is a common software to work with geospatial data (Matlab has its own functions to work with GDAL).

Load our image into an x-array:

Code

sat_xarr = rioxarray.open_rasterio(img_name)

We can easily check out some of the properties by just call this array:

Code

sat_xarr

Which shows in the top the number of spectral bands (6 in this case) and nr of pixels in the \(y\) and \(x\) directions. The next line shows the total nr of values in this array, which is \(x\ \times y\ \times\) bands \(=13,436,040\) and those are of integer type. In the Coordinates section you see that the coordinates are simply called \(x\) and \(y\) and are of floating type and that the bands are numbered as 1, 2, …, 6 (they could be given different names, such as R, G, B). The final section lists some of the meta-data attributes.

You can extract those separately as:

Code

sat_xarr.attrs

Other important geospatial metadata are stored in the “accessor” .rio. To see all of those, you can always use python’s dir:

Code

dir(sat_xarr.rio)

One majorly important geospatial attribute is the coordinate reference system, CRS, which is EPSG:32609 in this case:

Code

sat_xarr.rio.crs

This looks a bit messy. Here’s a few ways to extract information more cleanly:

Code

crs = sat_xarr.rio.crs

print(crs.to_epsg())     # → 32609
print(crs.to_authority())# → ('EPSG', '32609')
print(crs.to_string())   # → 'EPSG:32609'
print(crs.to_proj4())    # → '+proj=utm +zone=9 +datum=WGS84 +units=m +no_defs'
# print(crs.to_wkt())      # → the full WKT string

We can also get the spatial resolution of this image, i.e. the size of each pixels in units of meters:

Code

sat_xarr.rio.resolution()

You can also extract \(x\) and \(y\) coordinates, e.g., as

Code

sat_xarr.x

which themselves are xarrays. Even though these are xarrays instead of numpy arrays, some numpy functionality works directly on these xarrays, for example:

Code

np.shape(sat_xarr)

Code

np.shape(sat_xarr.x)

Code

print("Minimum, maximum, and mean x-coordinates:", np.min(sat_xarr.x).values, np.max(sat_xarr.x).values, np.mean(sat_xarr.x).values)

By now perhaps you’re getting curious about that this image actually looks like!? Below we use matplotlib as usual to make a nice figure. Because satellite images can have any number of spectral bands, rather than just RGB, but computer monitors can actually only show 3 colors at once, we have to specify which colors we want to plot. RGB is what our eyes interpret as ‘true color’, but you can also select any other combination as ‘false color’ which can make certain features stand out more clearly (try it!).

When working with xarrays, we can use .sel to select spectral bands by their names, which are just 1, 2, 3, …, 6, in this case.

The actual six bands and their order in these Sentinel-2 images are:

Blue ~493 nm
Green ~560 nm
Red ~665 nm
Near Infrared ~833 nm
Shortwave Infrared ~1610 nm
Shortwave Infrared ~2190 nm

So, note, that the first three bands are BGR rather than the common RGB. Let’s plot these in the usual RGB order:

Code

plt.figure(figsize=(16, 12));
sat_xarr.sel(band=[3, 2, 1]).plot.imshow(rgb='band', robust=True,add_colorbar=True);
plt.axis("off");

Exercise: check out plotting different band combinations and see if the water ‘pops’ out more by selecting different bands.

Load data into numpy array

Most/all things you can do with numpy arrays, you can also do with xarrays, plus much more for georeferenced data. For purposes where we don’t explicitly need the georeferencing data, though, the easiest thing to do is to simply load the actual pixel values for multispectral values into a numpy array, so we can use all the familiar codes for standard vector/matrix linear algebra.

Turning an xarray into a numpy array is quite trivial (we’ll save both here with different variable names):

Code

sat_arr = np.asarray(sat_xarr)
sat_arr

Note the very last bit of this output: the array is of type 16-bit integer. Check for yourself what the maximum value is for 16-bits integers.

Check dimensions again:

Code

# check dimensions of sat_arr

Typically, to show RGB images or to do linear algebra in neural networks, we want the number of channels (colors/bands) to be the last dimension, and not the first as in this (x)array. We can do this easily as:

Code

sat_arr = np.moveaxis(sat_arr, 0, -1)
[nry, nrx, nrbands] = np.shape(sat_arr)
print("Nr of pixels in y and x",nry, nrx, "and nr of bands/channels/colors:", nrbands )

Check the range of values (min/max) for each band:

Code

# print min/max values in each band

Calculate and plot water indices

Training and then making predictions for a fully convolutional neural network for image segmentation (classifying each pixel) can certainly be done with a jupyter notebook like this, but does take several hours for a sufficiently complex model. Here, we stick with the ‘old school’ approach of classifying water (or, e.g., vegetation) with pixel-based indices that simply compare the brightness of a pixel in different spectral bands. You can then compare those predictions to those from a deep FCN. The main purpose of this lab is to expose you to a little bit of python code for working with satellite imagery, more than practicing with code related to neural networks.

The Normalized Difference Water Index (NDWI) is defined as: \[ \mathrm{NDWI} = \frac{\mathrm{green} - \mathrm{NIR} }{\mathrm{green} + \mathrm{NIR}} \]

In other words, it checks how green each pixel is (not blue, as we colloquially refer to the color of water), relative to near-infrared (NIR).

Technical detail: In python, and some other programming languages, you have to be a bit carefull of the type of your variables. For example, when you perform devisions of variables that are explicitly defined as integers, you can get some weird results. Look at the two examples below, which essentially perform the same operation of \(\frac{204-205}{204+205}\). In the first case, the values are defined as numpy arrays with just 1 integer values and in the second case they are defined as floats. The second answer is correct, whereas the former is probably not what anyone would want.

Code

(np.array(204,np.uint8) - np.array(205,np.uint8)) / (np.array(204,np.uint8) + np.array(205,np.uint8))

To be super explicit, I therefore just turn our Sentinel image data into floats:

Code

sat_arrf = sat_arr.astype('float');

Now we can calculate the NDWI index for each pixel. We can do so either by treating the arrays as single numbers, similar to how you would do this in Matlab (i.e. just adding, subtracting, multiplying, and dividing arrays element-by-element):

Code

green = sat_arrf[:,:,1]
nir = sat_arrf[:,:,3]
ndwi = (green - nir)/(green + nir)
print("Min and max of NDWI", np.round(np.min(ndwi),2), np.round(np.max(ndwi),2))

Or you can use numpy functions that spell out the operations more explicitly.

Code

ndwi2 = np.divide(     np.subtract(green,nir),   np.add(green,nir))

Can you check that these two approaches give the same results?

Hint

np.sum(1*(ndwi != ndwi2))

Code

# check that the above two definitions of NDWI give the same result

Make a plot of the NDWI values for this image:

Code

# plot NDWI

For each pixel, NDWI is a single number and if you think about the definition of the NDWI index \[ \mathrm{NDWI} = \frac{\mathrm{green} - \mathrm{NIR} }{\mathrm{green} + \mathrm{NIR}} \] a bit, you’ll realize that these values are –by definition– between -1 and +1. So this is a grayscale image (but ranging from -1 to 1 instead of the usual 0 to 1).

The idea behind using NDWI and similar indices is that the NDWI values are high for water and smaller for not-water. More specifically, if NDWI is higher than some threshold value, we can classify that pixel as water and if it is less than that threshold, we classify it as not-water, or land. In other words, we need to threshold the continuous (\(-1<\mathrm{NDWI}<+1\)) NDWI index to obtain a binary classification result of water vs. land.

In the next cell, I do this for one particular choice of threshold, define a separate variable for the resulting binary water-land mask, and plot the result.

Exercise: * Define a variable threshold with some value of your choice. * Create a new variable like ndwi_bin to hold a binary water/land mask.

Hint

ndwi_bin = np.copy(ndwi)

Use your threshold to turn the continuous/float NDWI values into a binary land/water mask.

Hint
Use easy python code like A[A>0]=1
Plot your binary mask to see if it looks good.
Play around with the threshold to see if you are able to better separate the water from the land pixels.
More challenging exercise: modify the code to classify pixels with \(\mathrm{NDWI_{min}} < \mathrm{NDWI} < \mathrm{NDWI_{max}}\) as water and land otherwise (for example, \(0.05 < \mathrm{NDWI} < 0.3\) is water and all other pixels are land). See if that gives better results.

Code

# Create binary land/water mask

# plot

Modified NDWI, or MNDWI

Another common approach to do the same thing is to use the modified normalized water index, or MNDWI. There is nothing more to this than to take a short-wave inferared (SWIR) band instead of NIR. In other words, we compare the green band of a pixel to an ‘even-more-red’ band for that same pixel. \[ \mathrm{NDWI} = \frac{\mathrm{green} - \mathrm{SWIR} }{\mathrm{green} + \mathrm{SWIR}} \] Which wavelengths all of these bands refer to is different for each satellite, and as a result, the NDWI/MNDWI values and the corresponding thresholds are also different for every satellite. For Sentinel-2, we actually have 2 SWIR bands that are included in the data provided to your here as the last 2 bands.

Exercise: Copy-paste and edit the cells above to explore MNDWI indices based on the green and SWIR-1 as well as SWIR-2 bands, adjust thresholds, and see which combination gives you the best water classification.

Compare to FCN predictions

As mentioned above, performing a full FCN classification from within a jupyter notebook within the time of a single lab is perhaps not feasible, so I just provide you with masks that my group produced for these same images using our super-resolution FCN. In other words, we actually classify the 10-m resolution Sentinel-2 image at 2-m resolution.

In the cell below, we download the mask, import into a georeferenced xarray and then immediately convert into a regular numpy array. For storage efficiency, the mask labels were stored as integers 0 (land) and 255 (water), whereas our NDWI derived masks was labeled as 0 (land) and 1 (water). To be consistent, I rescale the FCN labels to 0 and 1 as well.

Code

download_file(url_mask, mask_name)

Code

mask_xarr = rioxarray.open_rasterio(mask_name)

Code

mask_arr = np.squeeze(np.asarray(mask_xarr,dtype='float')/255)
np.shape(mask_arr)

Note the \(5 \times\) larger pixel dimensions.

** Exercise** Plot the FCN and NDWI classifications side-by-side.

Code

# Plot your own land/water mask from (M)NDWI thresholding and the FCN predictions side-by-side.

Question: which one looks better?

Accuracy metrics

We will treat the FCN prediction as the ‘ground truth’ to see how well, or poorly, a thresholded NDWI index performs relative to the deep neural network prediction. The specific FCN labels that I provide are from our super-resolution FCN, so they are actually at 5x higher resolution than the native Sentinel-2 image and the derived NDWI classification. We could either downscale the FCN predictions to the 10-m Sentinel resolution or upsample the NDWI classification to 2-m resolution. Let’s do the latter to obtain a high-resolution mask. There are several ways to do the upsampling. Below you can choose between Nearest Neighbor or a more sophisticated cubic interpolation.

Code

# ndwi_fine = cv2.resize(ndwi_bin, dsize=(np.shape(mask_arr)[1],np.shape(mask_arr)[0]), interpolation=cv2.INTER_CUBIC)
ndwi_fine = cv2.resize(ndwi_bin, dsize=(np.shape(mask_arr)[1],np.shape(mask_arr)[0]), interpolation=cv2.INTER_NEAREST)

**Exercise* As a first comparison between the FCN and NDWI predictions, take the difference between the two masks and plot that, together with a color bar to show pixels that are -1, 0, 1?

Code

# Plot difference between FCN and NDWI water/land masks

Precision, Recall, and F1-score in Image Segmentation

When evaluating image segmentation, each pixel is treated as an individual prediction (rather than each image).
A segmentation model assigns every pixel to a class — for example, water, forest, or urban — and we compare those predictions to the ground-truth mask.

Metric	Definition	Intuition
Precision	\(\displaystyle \text{Precision} = \frac{TP}{TP + FP}\)	Of all pixels the model labeled as a class (e.g., “water”), what fraction were actually that class? High precision → few false positives (the model rarely labels a non-water pixel as water).
Recall	\(\displaystyle \text{Recall} = \frac{TP}{TP + FN}\)	Of all true pixels belonging to that class, what fraction did the model correctly identify? High recall → few false negatives (the model rarely misses real water pixels).
F1-score	\(\displaystyle F_1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}\)	The harmonic mean of precision and recall — a balanced measure that penalizes extreme trade-offs. \(F_1 = 1\) means perfect segmentation; \(F_1 = 0\) means total failure.

How these apply in segmentation

Each pixel contributes to the counts of true positives (TP), false positives (FP), and false negatives (FN).
Metrics can be computed per class and then averaged.
Often, these metrics are also expressed as the Intersection over Union (IoU) or Dice coefficient, which are closely related:
- \(\text{IoU} = \frac{TP}{TP + FP + FN}\)
- \(\text{Dice} = \frac{2TP}{2TP + FP + FN} = F_1\)

In practice

Precision answers: “When I predict a class, how sure am I?”
Recall answers: “Am I finding all occurrences of that class?”
F1-score combines both to summarize segmentation quality with one number.

Confusion Matrix for Image Segmentation

A confusion matrix summarizes how well a classification or segmentation model performs by comparing predicted and true labels.

For a segmentation problem with N classes (e.g., water, forest, urban), the confusion matrix is an N×N table:

	Predicted: Class 1	Predicted: Class 2	…	Predicted: Class N
True: Class 1	TP₁	FP_1→2	…	FP_1→N
True: Class 2	FP_2→1	TP₂	…	FP_2→N
…	…	…	…	…
True: Class N	FP_N→1	FP_N→2	…	TP_N

How to read it

The diagonal elements (top-left → bottom-right) show correct predictions (true positives for each class).
Off-diagonal elements show misclassifications, where the model confused one class for another.

For example: - If many “forest” pixels are misclassified as “pasture,” the cell at (true=forest, predicted=pasture) will have a large value. - A perfect segmentation would produce a diagonal matrix (all predictions match ground truth).

Why it’s useful

Reveals which classes are confused with each other (e.g., “water” vs “sea/lake”).
Complements overall accuracy and F1 metrics by showing class-specific errors.
You can compute per-class precision, recall, or IoU directly from each row/column.

Visualizing it

Confusion matrices are often shown as heatmaps, where brighter diagonal cells mean better performance:

plt.imshow(confusion_matrix, cmap="Blues")
plt.xlabel("Predicted class")
plt.ylabel("True class")
plt.title("Confusion Matrix")
plt.colorbar()

## Example: Confusion Matrix for Binary Land–Water Segmentation

For **binary segmentation**, we only have two possible classes:
- **Positive class:** e.g., *Water*
- **Negative class:** e.g., *Land*

The confusion matrix becomes a simple **2×2 table**:

|                              | **Predicted: Water (Positive)** | **Predicted: Land (Negative)** |
|:-----------------------------|:-------------------------------:|:------------------------------:|
| **True: Water (Positive)**   | **True Positive (TP)** — pixels correctly classified as water | **False Negative (FN)** — water pixels incorrectly labeled as land |
| **True: Land (Negative)**    | **False Positive (FP)** — land pixels incorrectly labeled as water | **True Negative (TN)** — pixels correctly classified as land |


A **2×2 heatmap** of these values highlights where your segmentation is missing water (false negatives) or mislabeling land (false positives).

Let's code this up (using code stolen from [here](https://stackoverflow.com/questions/50021928/build-confusion-matrix-from-two-vector)). So the first element of the confusion matrix `cm` is the total number of pixels that were both predicted and labeled as land (by far the largest number), the last (bottom-right) element is the nr of pixels both predicted and labeled as water, etc.

::: {#cell-88 .cell}
``` {.python .cell-code}
cm = np.zeros((2, 2), dtype=int)
np.add.at(cm, (ndwi_fine.astype('uint8'),mask_arr.astype('uint8')), 1)
[[TN,FN],[FP,TP]] = cm
cm

:::

We can now use the true-positive, true-negative, false-positive, and false-negative elements of the confusion matrix to define the precision, recall, and F1 score of the NDWI classification, relative to the FCN prediction, which we treat here as the ground truth:

Code

precision = TP/(TP+FP)
recall = TP/(TP+FN)
F1 = 2*precision*recall/(precision+recall)



print("Precision is:", int(100*precision),"%")
print("Recall is:", int(100*recall),"%")
print("F1 score is harmonic average of precision and recall:", int(100*F1),"%")

The numbers that you see here will depend on what image you are considering and whether you used NDWI or some MNDWI, and finally what threshold you selected. In general, you will see that NDWI can have a high recall, i.e. it catches most pixels that should be labeled as water, but a (often much) lower prediction, because it labels many pixels as water that are not water. Big rivers are easier to classify than small water bodies; and the middle of a big river is easier to classify than the precise shorelines. So NDWI can look pretty good for a huge river like the Yukon, but less and less good when looking at more complex scenes.

Conclusions

This short lab is primarily intended to just give you some very basic/introductory exposure to importing and manipulating multispectral satellite imagery in python / jupyter notebooks. Together with the more rigorous exploration of various machine learning algorithms throughout this semester, and their implementation from scratch in python, I hope this course will have given you a pretty solid starting point for any future research questions you may want to pursue that would benefit from python coding and/or machine learning.