PixelModel: When the Weights Are the Image
What if your neural network's weights weren't stored in some binary file or checkpoint, but were literally encoded in the pixels of a PNG image? That's the premise behind PixelModel, a playful experiment where the model is the image.
The Core Idea
In PixelModel, model.png isn't a picture of anything. It is the model. Every pixel's RGB values encode neural network weights:
- Red channel: Model output weight magnitude
- Blue channel: Model output weight sign (≥128 = positive)
- Green channel: Model output bias values
At inference time, pixels are parsed into three weight matrices forming a tiny MLP. The prompt is embedded into a vector, then a forward pass generates a 32×32 image. Training directly optimizes pixel values via gradient descent until the PNG itself becomes the model.
Architecture
The model takes a text prompt and generates images through a simple but effective pipeline:
prompt string
→ char-level embedding → 32-dim vector
→ W1 (64×32) → tanh
→ W2 (64×64) → tanh
→ W3 (3072×64) → sigmoid
→ reshape → 32×32×3 image
All weights live inside model.png. Opening the PNG is literally opening the neural network.
Usage
Training
Training is straightforward. You provide 6–20 image-prompt pairs, and the model learns to associate prompts with images by optimizing the pixel values directly:
python train.py
python train.py --epochs 500 --lr 0.05
Simple targets converge fastest—solid colors, gradients, and basic shapes work well. Typically, 200–500 epochs are sufficient, and a loss below 0.001 indicates good convergence for simple datasets.
Generation
Once trained, generating images is as simple as:
python main.py "red"
python main.py "a cat" --out cat_out.png --scale 8
The --scale 8 flag upscales the 32×32 output to 256×256 using nearest-neighbour interpolation to preserve the pixel structure.
File Structure
The repository is minimal and self-contained:
model.png ← THE MODEL (64×3200 px, ~284 KB)
main.py ← inference
train.py ← training
model.py ← architecture (pixels → weights → forward pass)
dataset/ ← training data
cat.png
cat.txt ← prompt: "a cat"
...
Why Build This?
It's a toy. It's not useful. But it's cool that it works.
PixelModel has a fixed capacity of approximately 600K implicit parameters. While it won't replace your favorite diffusion model, it's a fascinating demonstration of how neural network weights can be encoded in unconventional ways.
The project explores the boundaries of how we think about model storage and representation. What if your model could be shared as simply as an image file? What if you could see your neural network just by opening it in an image viewer?
Try It Yourself
The full code and trained model are available on the Hugging Face Hub. Clone the repository, provide your own image-prompt pairs, and watch as gradient descent transforms a PNG into a functioning neural network.
Check out the PixelModel repository to get started.