1. Learning and Neural Networks
PRELIMINARY AND IDIOSYNCRATIC! This belongs in Part III, but was placed here to avoid changing hard links to sections.
    Back in the 1990s I purchased a software program (they weren't called apps) that still exists but you've probably never heard of: Dragon Naturally Speaking. It would allow me speak into a special headset and it will produce the text in a plain file. I thought this might speed up might glacial output of research papers and make responding to email much quicker. However, it did not work out of the box. First you had to train it by reading several paragraphs out loud so it could learn your voice. But even then it was terribly inaccurate, and I never actually used it to do anything.

    Paying perhaps $50 for bad software when now your phone or smart speaker can understand you or even translate in real time a conversation into a different language may seem silly. But really the basic method is the same. The difference is simply computing power and much more sophisticated "training" of the system

    This leads us to the world of neural networks (NN) and machine learning (ML). We will cover only the basics, but with the purpose that you have some conception of how packages/modules work rather than simply use them naively. ML is being used in economics now, sometimes as a gimmick and sometimes for doing something better than other techniques. However, compared to the gap from Naturally Speaking software of the 1990s to voice recognition today, computational economics continues to use rely algorithms we cover in this course.

    (Artificial) neural networks involves many of the techniques already covered in this course, including linear equations, binary responses, and optimization of an objective.

  1. Neurons
  2. An artificial neuron is really simply a linear equation. In our earlier notation we might write it as: $$y=ax+b.$$ A few notes about jargon and minor differences is concepts! In neural networks the constant $b$ is called bias; $x$ is a $N\times 1$ vector of inputs; the $1\times N$ vector $a$ does not contain coefficients: it contains weights. The $y$ is still consider output. From a matrix algebra point of view we could include the bias in the first component by simply concatenating a 1 onto $x$ and placing $b$ in the weights matrix. Using Ox operators:

    $$y = (a~b)(x|1) = a^\star x^\star.$$

    That's all a neuron is: it responds to real valued inputs and produces a single output. But the output $y$ is often not the end of the story. The output is transformed by an activation, $V(y)$. These transformations have names: $$\eqalign{ \hbox{Linear:} V(y) &= y\cr \hbox{RectiLinear} V(y) &= I_{y\gt 0} y\cr \hbox{Sigmoid} V(y) &= {e^y \over 1+ e^y} = {1\over 1+e^{-y}}\cr }$$ What makes NN powerful are these additional elements: Neurons can be placed together in layers; their outputs become the inputs for another layer of neurons; and the weights in $a$ and the bias in $b$ can be chosen through a process called training. Training is the process of choosing weights and bias to match the output of the network to external data. The data include observed inputs paired with outputs the NN is supposed to produce. Once trained the NN can be used on new inputs to predict new output. This is what the Naturally Speaking software needed to do: The sounds I made when reading the training text was the input. The NN was trained to take those input signals and produce into the words on the page. After it was trained the hope was that the NN would be able to predict from the sounds I made the words I wanted in the document.

  3. Dense Layer
  4. A layer is a simply a set of neurons all connected to the same inputs. Each neuron is defined by its vector of weights $a$ (which include the bias in our notation). Then a layer can be represented as a matrix $A$ consisting of the stacked row vectors: $$y = Ax = \pmatrix{a_0 \cr a_1 \cr \vdots \cr a_{N-1}} x.$$ Now output is a vector of outputs from the layer of neurons. The outputs are activated by an activation $V(y)$ where now $V$ applies the scalar activation defined above to each element of the $y$ vector. So a layer $L=L(A,V)$ is a matrix of weights $A$ and an activation $V$. A dense layer simply says all the neurons in the layer receive all the same input $x.$
  5. Neural Network
  6. A neural network is a list of layers: $NN = \{L_0,L_1,\dots,L_m\}.$$ Although neurons at each layer have the same dimensions, the dimensions can differ across layers.