## The Linear Layer

We now introduce `Linear`

layers, a neural network layer abstraction that allows us to quickly build feedforward networks with ease. For various historical and non-historical reasons, you may see other deep learning resources or libraries refer to these as * dense *or

*layers, but they all mean basically the same thing. For our purposes, a*

**perceptron**`Linear`

layer will be one that simply applies a linear (sometimes called *affine*) transformation to the input. That is, for an input $X \in \mathbb{R}^{m \times n}$, they'll apply a transformation that looks like this

$$\text{Linear}(X) = W X + b$$

We call $W$ the *weight matrix* for the layer and $b$ the *bias vector*. If you view $W$ as applying a linear map to $X$, then $b$ allows us to shift that mapping off the origin. This is key to the representational power of the affine transformation. You can refer to the linked Wikipedia article to learn more about the interesting properties of affine transformations, but some of the key delineators are that they preserve

- Collinearity (points lying on the same line, lie on that same line after the transformation)
- Parallelism (parallel lines remain parallel after the transformation)
- Convexity (convex sets in the domain remain convex after the transformation is applied)

Now that we've introduced the `Linear`

layer, let's work on its implementation within our FlameFlower library.

## The Implementation

First things first, let's recall the `Module`

class of our `nn`

library. This provides a basic construction for neural network "parts" that we can string together to make a full model. As such, our `Linear`

layer class will inherit from `nn.Module`

. Now we can look toward the class `__init__`

method. If you look back to the layer definition, all we need to do is specify the two parameters which comprise it: $W$ and $b$. Remember, $X$ is a matrix with rows containing our training examples and columns containing the features. Therefore, the number of columns of $X$ will be the `input_size`

of our layer. Instead of multiplying on the left by $W$ as in the layer definition, we'll actually multiply by it on the right in order to make the dimensions work out. Therefore, we'll have `input_size`

number of rows in $W$ and `output_size`

number of columns. Having this figured out, let's start the implementation of `__init__`

### The `__init__`

Method

```
def __init__(self, in_size, out_size):
super(Linear, self).__init__()
self.in_size = in_size
self.out_size = out_size
```

Remember that because we're inheriting from `nn.Module`

we need to use the Python built-in `super()`

function to instantiate the parent class.

Next, let's add a couple enhancements to the above `__init__`

implementation. Let's defer handling of actually initializing the model parameters, `W`

and `b`

, to a private method `_init_parameters()`

. Finally, let's pass a keyword argument `use_bias`

to `__init__`

which allows us to specify whether we want to use the bias in the layer. The updated `__init__`

should now look like this.

```
def __init__(self, in_size, out_size, use_bias=True):
super(Linear, self).__init__()
self.in_size = in_size
self.out_size = out_size
self.use_bias = use_bias
self._init_params()
```

### The `_init_parameters`

Method

Now, let's turn our attention to actually filling the `W`

and `b`

parameters with their initial values. If you've read (which you should have!) the lesson on parameter initialization, you'll know that there are various schemes that can be used for sampling the initial values of weight matrices (vectors). We'll implement these in a separate module called `init.py`

. The `_init_parameters`

method will instead just take an optional `init_fn`

keyword argument which will pass a reference to a desired parameter initialization function which handles all the value sampling. The default such function we'll use will be called `glorot_uniform`

. This will handle the initialization of `W`

. For `b`

, we'll just initialize it to a vector of zeroes. This is a pretty commonly used practice for setting initial biases which works pretty well.

Another thing we want to do is ensure that we wrap `W`

and `b`

as `Tensor`

objects. This will ensure that they're tracked by `autograd`

and will be optimized via backpropagation during neural network training.

Finally, we'll want to call `self.new_param(param_name, param)`

for each of the parameters we initialize. This is an underlying method of the `Module`

class and allows the parameters to be tracked as part of the module, so that they can be used by `Optimizer`

s (more on these later). Let's see what all of this looks like in code.

```
def _init_params(self, init_fn=None):
if not init_fn:
init_fn = init.glorot_uniform
self.W = Tensor(init_fn(self.in_size, self.out_size))
self.b = Tensor(tl.zeros((1, self.W.shape[1])))
self.new_param('W', self.W)
if self.use_bias:
self.b = Tensor(tl.ones((1, self.W.shape[1])))
self.new_param('b', self.b)
```

### The `forward`

Method

Now it's time for the implementation bread and butter. If you'll recall from the `Module`

section, every `Module`

must implement a `forward()`

method which specifies the model computation when called on an input. In our case, we just implement the simple equation from the `Linear`

layer definition. Remember, we can use `@`

as an alias for Numpy matrix multiplication. The code looks as follows.

```
def forward(self, X):
if self.use_bias:
return X @ self.W + self.b
else:
return X @ self.W
```

## The Entire Thing (Imports and All)

```
from .module import Module
from flamethrower.autograd import Tensor
import flamethrower.autograd.tensor_library as tl
import flamethrower.autograd.tensor_library.random as tlr
import flamethrower.nn.initialize as init
class Linear(Module):
def __init__(self, in_size, out_size, use_bias=True):
super(Linear, self).__init__()
self.in_size = in_size
self.out_size = out_size
self.use_bias = use_bias
self._init_params()
def _init_params(self, init_fn=None):
if not init_fn:
init_fn = init.glorot_uniform
self.W = Tensor(init_fn(self.in_size, self.out_size))
self.b = Tensor(tl.zeros((1, self.W.shape[1])))
self.new_param('W', self.W)
if self.use_bias:
self.b = Tensor(tl.ones((1, self.W.shape[1])))
self.new_param('b', self.b)
def forward(self, X):
if self.use_bias:
return X @ self.W + self.b
else:
return X @ self.W
```