I thought fine-tuning a small language model would be a weekend project... it took three full days
I wanted to make a basic AI that could write simple, clear summaries of tech news articles. I had a small dataset of about 500 examples I'd made myself. I figured I'd just load it into a model like Mistral 7B, run the training script, and be done by Sunday night. The first run failed because my formatting was wrong, and the output was just garbled text. The second try used too much memory and crashed my local machine after 8 hours. I finally got it running on a cloud service, but then I spent a whole day just tweaking the learning rate and batch size over and over to stop it from giving the same answer every time. What I thought would take maybe 15 hours ended up being closer to 30, with most of that just fixing my own setup mistakes and bad guesses. Has anyone else had a simple training job spiral because of basic config issues?