Is MD5 Really Dead? Separating Hash Hype From Reality

Every few months, someone posts a comment on Stack Overflow or a dev forum that goes something like: "You're using MD5? That's completely broken. Stop immediately." The response gets upvoted into the stratosphere, the original poster feels embarrassed, and another developer walks away with a half-formed understanding of hash functions that will haunt them for years.

Here's the thing: that advice isn't wrong, exactly. It's just catastrophically incomplete. And incomplete security advice is sometimes more dangerous than no advice at all, because it gives people confidence without context.

Let's actually dig into what MD5 is broken for, what it's perfectly fine for, and why the blanket "never use MD5" mantra — however well-intentioned — flattens a genuinely nuanced topic into mush.

What Does "Broken" Actually Mean?

When cryptographers say MD5 is broken, they mean something specific: collision resistance has been practically defeated. A collision is when two different inputs produce the same hash output. In 2004, Xiaoyun Wang and her colleagues demonstrated the first practical MD5 collision. By 2008, researchers had forged a rogue CA certificate using MD5 collisions. By 2012, the Flame malware — a nation-state cyberweapon — exploited MD5 weaknesses to spoof Windows Update signatures.

These are genuinely catastrophic failures. They are also specifically failures in cryptographic security contexts — places where an attacker can deliberately craft inputs to fool a system.

MD5 also isn't suitable as a password hashing algorithm, but that's a different problem entirely. Password hashing needs to be intentionally slow (bcrypt, Argon2, scrypt exist for this reason). MD5 is fast by design. Using MD5 for passwords was never really correct, even back when collision attacks felt theoretical.

But neither of these issues — collision attacks or password hashing suitability — has anything to do with the majority of places you'll actually encounter MD5 in real developer tooling.

The Case MD5 Still Wins: File Integrity and Checksums

Download a Linux ISO from any major distribution's website and you'll almost certainly find an MD5 checksum sitting right next to it. Ubuntu does it. Debian does it. CentOS did it for years. Are all these projects staffed by security-illiterate people? Obviously not.

The thing is, MD5 checksums for file downloads solve a completely different problem than what cryptographic security addresses. They detect accidental corruption — a dropped bit during download, a flipped byte in transit, a storage error. For this purpose, MD5 is fast, universally supported, and entirely adequate.

The collision attack concern only matters if an attacker is actively trying to substitute a malicious file that happens to produce the same MD5 hash. For a trusted download mirror serving a pre-announced file, that threat model doesn't apply — and if you're worried about tampered downloads, the right answer is GPG signature verification, not switching from MD5 to SHA-256 checksums.

Confusing these two concerns is the heart of the "MD5 is dead" confusion. Checksums and cryptographic security aren't the same job. They just happen to use similar-looking tools.

Deduplication, Caching, and Data Fingerprinting

A huge number of production systems use MD5 for non-security fingerprinting tasks, and they're right to do so.

Content delivery networks use hash-based ETags to check whether a file has changed since the last request. Build systems like Gradle and many CI pipelines fingerprint artifact files to skip redundant work. Database-backed deduplication engines hash file contents to identify duplicates before storing them. None of these systems care about collision resistance — they care about speed, low collision probability in natural data, and broad ecosystem support.

MD5 produces a 128-bit output. The probability of a random accidental collision between two different files is roughly 1 in 2^128, which is approximately one in 340 undecillion. For any normal deduplication or caching use case, this is so close to zero that it's not worth measuring. The only scenario where collisions become a realistic concern is when a human adversary is deliberately engineering them — and for internal tooling and build systems, that threat model rarely applies.

MD5 is also meaningfully faster than SHA-256 in software implementations without hardware acceleration. On older embedded hardware, constrained environments, or extremely high-throughput pipelines, this still matters.

Where You Absolutely Should Not Use MD5

Let's be completely direct about the cases where the warnings are fully justified.

TLS certificates and digital signatures: The Flame attack and the rogue CA certificate incident happened here. Using MD5 in any certificate authority workflow is genuinely dangerous and has been for over a decade. Modern CAs don't do this.

Message authentication codes: If you're using HMAC-MD5, you're in a gray zone. HMAC-MD5 is actually more resistant to known attacks than raw MD5 because the construction differs, but you should still prefer HMAC-SHA256 for new systems. There's no good reason to reach for the older option.

Anything described as "secure" in user-facing documentation: If your system's security model depends on an attacker being unable to craft a colliding input, MD5 isn't safe. Full stop.

Password storage: Bcrypt. Argon2. Scrypt. Any of these. Not MD5, not SHA-256, not SHA-512. Password hashing is a specialized problem requiring memory hardness and intentional slowness. General-purpose hash functions — including "secure" ones like SHA-256 — are the wrong tool here.

Anywhere a regulator or auditor has explicitly prohibited it: PCI-DSS and FIPS compliance standards have explicit restrictions. If you're in a regulated environment, MD5 may be off the table for compliance reasons entirely separate from the technical risk, and that's a perfectly valid reason to avoid it.

The Tool That Gets Blamed for the Wrong Job

Here's a mental model that might help: MD5 is like a box cutter. Entirely appropriate for opening packages. Not appropriate for performing surgery. If someone tells you "box cutters are dangerous, never use them," they're not wrong in the surgical context — but they've also left you confused about why your warehouse team keeps buying them.

The "never use MD5" advice emerged from real incidents, primarily in the PKI and certificate ecosystem. Those incidents were real. The advice spread appropriately through the security community. Then it spread to developer communities where the nuance got sanded off, and now we have junior developers who think MD5 checksums in a private build system are a security incident.

This matters because cryptographic cargo-culting — applying security advice without understanding why — has its own costs. Developers who don't understand why MD5 is unsuitable for cryptographic use cases can't make good decisions when they encounter edge cases. They'll either avoid MD5 in places it's perfectly fine (minor inefficiency, no real harm) or worse, they'll replace MD5 with SHA-256 in a password storage system and feel like they've solved a problem when they've actually just made a slightly stronger version of the same mistake.

A Practical Decision Framework

When you're reaching for a hash function, ask yourself one question first: is an adversary part of my threat model?

If the answer is no — you're fingerprinting build artifacts, deduplicating files in an internal system, computing ETags for caching, or generating identifiers for non-security purposes — MD5 is fine. SHA-256 is also fine. Pick whichever the rest of your toolchain already uses and move on. Don't let hash algorithm selection become a week-long bikeshed discussion.

If the answer is yes — you need integrity guarantees an attacker can't subvert, you're signing data, or you're doing anything that appears in a threat model document — use SHA-256 or better. Do not use MD5. Also probably consult someone who specializes in applied cryptography rather than relying on a blog post, including this one.

If you're storing passwords: neither MD5 nor SHA-256 is the right tool. Use a proper password hashing function. This is a separate question entirely and the answer hasn't changed in fifteen years.

The Actual Lesson

MD5 is broken as a cryptographic primitive. It is not broken as a general-purpose hash function for non-adversarial use cases. These are different claims and conflating them has been causing confusion in developer communities for twenty years.

The next time you see someone getting scolded for using MD5, look at what they're actually using it for. If it's a file checksum, a cache key, or a build artifact fingerprint — there's a decent chance the scolder is more confident than they are correct. The actual question to ask is whether an attacker has any motive or ability to craft a collision in that specific context, not whether MD5 has known theoretical vulnerabilities.

Good security engineering is context-dependent. "Never use X" advice that strips context is worth less than it sounds like, even when X is genuinely dangerous in the contexts where the warning originated.

Use the right tool for the right job. Understand why you're making the choice. And maybe, occasionally, push back on upvoted Stack Overflow comments before treating them as doctrine.