I have one 2D array called no2 which is related to the other two 2d arrays sza and vza.
Test data (test.npz, 450 KB) can be downloaded from Google Drive.
Here's the overview:
import numpy as np
import matplotlib.pyplot as plt
data = np.load('test.npz')
sza = data['sza']
vza = data['vza']
no2 = data['no2']
fig, axs = plt.subplots(2, 2, figsize=(8, 6))
ax1, ax2, ax3, ax4 = axs.flat
m = ax1.pcolormesh(no2)
plt.colorbar(m, ax=ax1)
ax1.set_title('no2')
m = ax2.pcolormesh(sza)
plt.colorbar(m, ax=ax2)
ax2.set_title('sza')
m = ax3.pcolormesh(vza)
plt.colorbar(m, ax=ax3)
ax3.set_title('vza')
s = ax4.scatter(sza, no2, c=vza, s=1)
plt.colorbar(s, ax=ax4, label='vza')
ax4.set_xlabel('sza')
ax4.set_ylabel('no2')
plt.tight_layout()
I wanna replace the two high no2 regions based on the surrounding background or low no2 values to get something like this:
Because it seems the no2 relies on the sza linearly as shown in the last subplot, I come up with three ideas:
Curve fit
Using the fitting between no2 and sza with several vza bins to calculate the background no2 for replacing the high no2 values:
fig, axs = plt.subplots(3, 4, figsize=(12, 6))
ax = axs.flat
for index,bin in enumerate(range(5, 65, 5)):
mask = (vza>bin)&(vza<bin+5)
# print(index)
s = ax[index].scatter(sza[mask], no2[mask], c=vza[mask], s=1)
plt.colorbar(s, ax=ax[index], label='vza')
ax[index].set_title(str(bin)+'<vza<'+str(bin+5))
for ax in axs.flat:
ax.set_xlabel('sza')
ax.set_ylabel('no2')
plt.tight_layout()
I tried to fit the curve for one bin (45<sza<50):
from scipy.optimize import curve_fit
def func(x, a, b, c):
return a * np.exp(-b * x) + c
xdata = sza[(vza>45)&(vza<50)]
ydata = no2[(vza>45)&(vza<50)]
popt, pcov = curve_fit(func, xdata, ydata, p0=(1, 1e-5, 1))
plt.plot(xdata, ydata, 'b-', label='data')
plt.plot(xdata, func(xdata, *popt), 'r-',
label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))
plt.legend()
However, it failed to get what I want:
Is it possible to meet both conditions below?
- Fit curve and get the background values for high values
- Add random noise to the fitted background values (this can run several times to get more real values like the surrounding background values)
Or any other better methods?
Gradient
I checked the gradient and hoped it can make the high values more significant:
# https://stackoverflow.com/questions/34003993/generating-gradient-map-of-2d-array
grad = np.gradient(no2)
fulgrad = np.sqrt(grad[0]**2 + grad[1]**2)
fig, axs = plt.subplots(1, 2, figsize=(6, 3))
ax1, ax2 = axs.flat
m = ax1.pcolormesh(no2)
plt.colorbar(m, ax=ax1)
ax1.set_title('no2')
m = ax2.pcolormesh(fulgrad)
plt.colorbar(m, ax=ax2)
ax2.set_title('no2 gradient')
plt.tight_layout()
However, it can only show some outlines:
Image processing
I can't figure out how to replace the high values only and keep the background unchanged using the scikit-learn.





