There was a point in time this summer where I found this Erlang tutorial on artificial neural networks – you can find it here. Apparently, it’s an aggregation of a series of blog posts written up by this guy a few years back. When I was following through doing it, I had a bit of an issue with continuity in the later parts of the tutorial and a few errors as well (derivative of sigmoid was wrong). But it was good overall and if anyone’s interested, I managed to put something together that executes here. Still, there’s a lot of unused/useless/confusing code that should be killed but try to ignore it.
Here are a few notes that I gathered in a flash (and which you can gather by following the links above):
- Neurons are modeled by perceptrons
- Perceptrons = summing junction + nonlinear element
- Summing junction – dot the input vector with coefficients (state)
- Nonlinear element – take that dot product, map it under the sigmoid function for your output
- You hook up a bunch of these perceptrons into a DAG (basically a trellis without the one input node restriction) and call it a neural network
- You train the neural network by putting in some input signals, compare the generated output with an expected output, and perform backpropagation to adjust the coefficients
With that high-level review, I’m thinking that the tutorial just got through hooking up a small neural network with 1 input layer, 1 hidden layer, and 1 output layer, and training it. I’m not sure the code that I have has it correct because, well, where’s the expected output? -_- And how to use it once it’s trained, I’m not exactly sure. But I thought I’d put the code up so that I could look at some of the code again now that a significant amount of time has passed and I’ve learned some other related things that might help this make more sense. It sort of does, from a high-level point of view, but I couldn’t explain the details of how the neural network is actually adjusting a hyperplane to classify an input vector; I didn’t work any proofs, I just wrote some Erlang.
Just a thought: If the neural network is just doing high-dimensional gradient descent, why can’t we just solve the problem and implement gradient descent? I guess a solution in the form of a neural network is somehow more general? Or maybe it’s simpler: you’re really just feeding some training data to the network and then using it to predict new data. I don’t know. It’s an interesting approach though.