← Back
June 2026

PixelModel: When the Weights Are the Image

What if your neural network's weights weren't stored in some binary file or checkpoint, but were literally encoded in the pixels of a PNG image? That's the premise behind PixelModel, a playful experiment where the model is the image.

The Core Idea

In PixelModel, model.png isn't a picture of anything. It is the model. Every pixel's RGB values encode neural network weights:

At inference time, pixels are parsed into three weight matrices forming a tiny MLP. The prompt is embedded into a vector, then a forward pass generates a 32×32 image. Training directly optimizes pixel values via gradient descent until the PNG itself becomes the model.

Architecture

The model takes a text prompt and generates images through a simple but effective pipeline:

prompt string
  → char-level embedding → 32-dim vector
  → W1 (64×32) → tanh
  → W2 (64×64) → tanh
  → W3 (3072×64) → sigmoid
  → reshape → 32×32×3 image

All weights live inside model.png. Opening the PNG is literally opening the neural network.

Usage

Training

Training is straightforward. You provide 6–20 image-prompt pairs, and the model learns to associate prompts with images by optimizing the pixel values directly:

python train.py
python train.py --epochs 500 --lr 0.05

Simple targets converge fastest—solid colors, gradients, and basic shapes work well. Typically, 200–500 epochs are sufficient, and a loss below 0.001 indicates good convergence for simple datasets.

Generation

Once trained, generating images is as simple as:

python main.py "red"
python main.py "a cat" --out cat_out.png --scale 8

The --scale 8 flag upscales the 32×32 output to 256×256 using nearest-neighbour interpolation to preserve the pixel structure.

File Structure

The repository is minimal and self-contained:

model.png       ← THE MODEL (64×3200 px, ~284 KB)
main.py         ← inference
train.py        ← training
model.py        ← architecture (pixels → weights → forward pass)
dataset/        ← training data
  cat.png
  cat.txt       ← prompt: "a cat"
...

Why Build This?

It's a toy. It's not useful. But it's cool that it works.

PixelModel has a fixed capacity of approximately 600K implicit parameters. While it won't replace your favorite diffusion model, it's a fascinating demonstration of how neural network weights can be encoded in unconventional ways.

The project explores the boundaries of how we think about model storage and representation. What if your model could be shared as simply as an image file? What if you could see your neural network just by opening it in an image viewer?

Try It Yourself

The full code and trained model are available on the Hugging Face Hub. Clone the repository, provide your own image-prompt pairs, and watch as gradient descent transforms a PNG into a functioning neural network.

Check out the PixelModel repository to get started.