How to implement sklearn's PolynomialFeatures in tensorflow?

Question

I am trying to implement scikit-learn's PolynomialFeatures as a layer in a feedforward neural network in tensorflow and Keras. I'll give an example using NumPy arrays for the sake of simplicity. If a batch has three samples and the activations of a certain layer are equal to the (3, 2)-shaped matrix

>>> X = np.arange(0, 6).reshape(2, 3)
>>> X
array([[0, 1],
       [2, 3],
       [4, 5]])

then I would like the activations in the next layer to be equal to a degree-2 polynomial feature expansion of X:

>>> from sklearn.preprocessing import PolynomialFeatures
>>> PolynomialFeatures(degree=2).fit_transform(X)
array([[  1.,   0.,   1.,   0.,   0.,   1.],
       [  1.,   2.,   3.,   4.,   6.,   9.],
       [  1.,   4.,   5.,  16.,  20.,  25.]])

That is, if the activations of layer i are the matrix X (of shape (batch_size, num_features)), then for the parameter choice degree=2 I would like the activations of layer i + 1 to be a concatenation of

a column of batch_size many 1.'s,
X itself,
and element-wise products of all unordered pairs of the columns of X: X[:, 0] * X[:, 0], X[:, 0] * X[:, 1], and X[:, 1] * X[:, 1].

My closest solution so far is to concatenate some powers of X:

import keras.backend as K
X = K.reshape(K.arange(0, 6), (3, 2))
with K.get_session().as_default():
    print(K.concatenate([K.pow(X, 0), K.pow(X, 1), K.pow(X, 2)]).eval())

Output:

[[ 1  1  0  1  0  1]
 [ 1  1  2  3  4  9]
 [ 1  1  4  5 16 25]]

i.e., a concatenation of two columns of 1s (one more than I'd like, but I can live with this duplication), X itself, and X squared element-wise.

Is there a way to compute products of different columns (in an automatically differentiable way)? The step of PolynomialFeatures that I cannot figure out how to implement in tensorflow is to fill in a column of a matrix with the product (across axis=1) of certain columns of another matrix: XP[:, i] = X[:, c].prod(axis=1), where c is a tuple of indices such as (0, 0, 1).

Elias Hasle · Answer 1 · 2021-04-28T12:44:01.270

If you construct a vector v_1 of all n base features and make an outer product of that vector with itself, the result will be a symmetrical (n,n) matrix M_2 of all pairwise products of features (with squares on the diagonal). You may use tensorflow_probability.math.fill_triangular_inverse to extract a triangular slice of unique entries into a vector v_2. The concatenation of v_1 and v_2 would then serve as a polynomial feature vector up to degree two. Here is an example with only two feature dimensions:

v_1 = (x,y) ==> M_2 = (xx, xy; yx, yy) = (x^2, xy; xy, y^2) ==> v_2 = (x^2, xy, y^2)

By augmenting v_1 with a 1, you will also get a constant 1 and the first-degree values in the same output.

An outer product of v_2 and v_1 will return a rectangular matrix M_3 that contains cubic terms with duplicates. There may be some tricks to filter out the duplicates and even generalize to higher degrees d. (Generating an outer-product d-cube and extracting a generalized triangular slice comes to mind as an inefficient solution.)

It is certainly possible by other means to make a vector of unique polynomial features up to desired degree (i.e. enumerating multisets of bounded cardinality), but the sheer number of generated features will inevitably become unmanageable/bloated, unless either the input dimension n or the degree d is very small. (See https://en.wikipedia.org/wiki/Multiset#Counting_multisets)

Could it be an alternative to have a trainable selector of dimensions for a limited number of polynomial (or general product(xi^wi)) features? for some applications, Deepmind's NALU units may be useful. They are capable of learning weighted addition and weighted multiplication (of positive numbers).

Another way to extract a limited number of polynomial features, is to stack multiplicative layers of the form f(PI(wij*xj+bij)) (with unity activation f), as described by Yadav, Kalra & John (2006) and implemented by myself here (not thoroughly tested yet).

PolynomialCrossing defines a type of layer that when stacked provides (lossy) polynomial feature extraction. Each layer basically multiplies a linear projection of the previous output layer element-wise with the initial input. It is demonstrated to be useful.

score 1 · Answer 2 · answered Jun 20 '22 at 08:10

Here is a "working" solution. It takes a 2d array (batch,linear features) and outputs degree-2 polynomial feature expansion of it. Turning that into Keras layer shouldn't be difficult. There is room for improvement...

import tensorflow as tf
import tensorflow_probability as tfp
import numpy as np

def tfPolyFeat(x):
    x = tf.convert_to_tensor(x,dtype=tf.float32)
    #first get a vecotor of ones
    o = tf.ones([x.shape[0],1])
    
    #get squared features
    a = tf.expand_dims(x,-1)
    b = tf.expand_dims(x,-2)
    m = a*b
    m = tf.linalg.band_part(m, -1, 0)
    d = tfp.math.fill_triangular_inverse(m)
    
    #put everything together (ones, lin, squared)
    res = tf.concat((o,x,d),axis=1)
    return res

x = np.arange(9).reshape([3,3]).astype(np.float32)
tfPolyFeat(x)

Output:

<tf.Tensor: shape=(3, 10), dtype=float32, numpy=
array([[ 1.,  0.,  1.,  2.,  4.,  2.,  0.,  0.,  1.,  0.],
       [ 1.,  3.,  4.,  5., 25., 20., 15.,  9., 16., 12.],
       [ 1.,  6.,  7.,  8., 64., 56., 48., 36., 49., 42.]], dtype=float32)>

How to implement sklearn's PolynomialFeatures in tensorflow?

2 Answers2