AI Dictionary
Intermediate· ~2 min read#neural-network#deep-learning

Neural Network

Layered learning model

A layered mathematical model loosely inspired by how neurons connect in the brain.

INPUTHIDDENHIDDENOUTPUTeach connection has a weight tuned during training
Definition

A neural network has an input layer, one or more hidden layers, and an output layer. Each connection has a weight. During training those weights are continuously adjusted so the error shrinks (via the backpropagation algorithm).

A network with many hidden layers is called a deep neural network. This is the foundation of modern AI: CNNs handle vision, RNNs/Transformers handle text, GANs/Diffusion handle generation — all on this base.

What the network "learns" is really millions of numbers (weights). A 7B LLM = 7 billion tuned numbers. You can't read them by hand, but together they can produce language.

Analogy

Imagine a wall of kitchen faucets. You're tweaking each one a little to get the right water temperature. You keep adjusting until the output feels right. In a neural net, "faucets" = weights, "right temperature" = correct answer. Tuning 7 billion faucets sounds hard — that's exactly what gradient descent does.

Real-world example

Handwritten digit recognition (the classic MNIST example): 28×28 pixel image, 784 input neurons. Then a 128-neuron hidden layer. Then 10 output neurons (digits 0-9). This small network with ~100K weights hits ~98% accuracy.

The big version of the same architecture: GPT-4 has ~1.8 trillion weights. Same math, just many more layers and connections.

When to use
  • Complex, non-linear patterns (vision, audio, language)
  • Plenty of data — neural nets thrive on scale
  • Raw data where you can't engineer features — the net does it for you
  • When you need transfer learning — fine-tune a pretrained net
When not to use
  • Sparse data — high overfitting risk; classical ML is better
  • Small tabular data (spreadsheet-sized) — XGBoost usually beats a neural net
  • Compute is tight — neural nets eat GPU
  • Full explainability is required — the network is a black box
Common pitfalls

Vanishing/exploding gradients

In very deep networks gradients either vanish (learning halts) or explode (numbers go inf). ResNets, batch norm, and proper initialization exist to fix this.

Wrong activation function

ReLU is today's default but not always right. The output-layer choice is critical: sigmoid (binary), softmax (multi-class), linear (regression).

Hyperparameters as magic numbers

Learning rate, batch size, number of layers — these need trial and error. Expecting defaults to 'just work' is a trap; every dataset is different.