Tools

A Regex Cheatsheet for Real Tasks (Not the Textbook Version)

June 10, 2026·9 min read

The Cheatsheet Problem

Most regex cheatsheets are organized for the wrong audience. They list every metacharacter in alphabetical order, define \d as "a digit" and \w as "a word character," and then leave you to figure out why your pattern matches the wrong thing in production at 11pm on a Tuesday. The textbook approach treats regex as a syntax. In practice, it is a tool for a small number of recurring real-world tasks, and the useful reference is one organized around those tasks rather than around the alphabet.

This is that reference. It assumes you know roughly what a regular expression is, that you have written one or two before, and that what you actually want is a list of patterns that solve real problems plus a note on the gotcha that will bite you if you copy the pattern blindly. Where the JavaScript flavor differs from PCRE in a way that matters, I have called it out. The patterns are written for the JavaScript engine because that is what runs in the browser and in Node, but most of them work unchanged in Python, Ruby, and Go.

If you want to follow along and confirm a pattern does what you expect, the regex tester on this site is intentionally minimal: paste a pattern, paste a test string, see what matches with the index and capture groups laid out. There is no signup and nothing leaves the page. I lean on it heavily while writing this kind of post, and I will reference it once or twice below.

The Building Blocks Worth Memorizing

Five concepts cover most of the regex work a working developer ever does. Memorize these and you will spend less time on the rest of the cheatsheet than you think.

Character classes. Square brackets define a set of characters that match one position. [abc] matches an a, b, or c. Ranges work: [a-z], [0-9], [A-Za-z0-9_]. Negation flips the set: [^0-9] matches anything that is not a digit. The shorthand classes \d, \w, and \s are the most common: digit, word character (letter, digit, or underscore), whitespace. Their negations are the uppercase versions: \D, \W, \S. Worth knowing: \w in JavaScript is ASCII-only by default. If you are matching a name with an umlaut or an accented character, \w will silently miss it. Use the u flag and Unicode property escapes (\p{L} for any letter) instead.

Quantifiers. Quantifiers say how many times the previous element should match. ? means zero or one, * means zero or more, + means one or more, {n} means exactly n, {n,m} means between n and m. All of these are greedy by default — they match as much as they can and then back off if the rest of the pattern fails. Adding a trailing ? makes them lazy, matching as little as possible. The classic trap, which has cost more aggregate engineering time than any other regex mistake, is <.+> applied to <a>hello</a>. Greedy .+ swallows the entire string up to the last >, so you get one match covering everything. <.+?> with the lazy quantifier matches each tag separately. Once you have been bitten by this, you stop forgetting.

Anchors. Anchors do not consume characters; they assert a position. ^ is the start of the string (or the start of a line in multiline mode), $ is the end. \b is a word boundary, the position between a word character and a non-word character. The most common use of \b is to match whole words: \bcat\b matches cat in "the cat sat" but not cat in "catalog." If you ever find yourself writing [^a-zA-Z]cat[^a-zA-Z], you wanted \b.

Groups. Parentheses group elements together and, by default, capture them. (\d{3})-(\d{4}) applied to 555-1234 matches the whole string and captures 555 as group 1, 1234 as group 2. In a replacement string you refer to them as $1 and $2 (or \1 and \2 in some flavors). If you only need the grouping for the quantifier and you do not care about the capture, use a non-capturing group: (?:abc)+. This costs nothing and keeps your group numbers tidy. Named groups exist in modern JavaScript: (?<year>\d{4}), referenced as $<year> in replacements.

Alternation. The pipe matches one of several alternatives: cat|dog|fish. Alternation has the lowest precedence, so ^cat|dog$ means "starts with cat or ends with dog," not "starts and ends with either." Group it: ^(cat|dog)$.

The Patterns You Will Actually Reach For

Most regex work involves a small set of recurring tasks. Below is each one with a pattern that works, followed by the caveat you need to know before you ship it.

Matching Email Addresses

The honest pattern is [^\s@]+@[^\s@]+\.[^\s@]+. It matches anything that looks like an email by the loosest reasonable definition: some characters, an at sign, some characters, a dot, some more characters. It is wrong against the full RFC 5322 grammar, which allows quoted strings, comments, and almost any printable character in the local part. The strictly correct pattern is roughly eighty lines long and nobody uses it.

The right move in 99% of cases is to validate with the loose pattern, then send a confirmation email and trust the SMTP server to tell you if delivery actually works. The only thing regex can do is reject obviously malformed input early. foo@bar with no TLD will pass the loose pattern but fail in real use; that is fine, because the confirmation email will fail to send and you will catch it then. Do not spend an afternoon trying to make a regex enforce RFC 5322. That is not what regex is for.

Extracting URLs from Text

A reasonable pattern is https?:\/\/[^\s)]+. It matches http or https, the colon and slashes, and then runs until whitespace or a closing parenthesis. The closing parenthesis is there because URLs in prose are often wrapped in (parens) and you want to exclude the trailing one. This pattern overmatches if a URL legitimately contains a closing paren (Wikipedia article URLs are the classic case), so for those domains you may need to be more careful. For 90% of log scraping and notification-text parsing, the simple version is fine.

Parsing Numbers and Decimals

Match a signed integer with -?\d+. Match a decimal with -?\d+(\.\d+)?. Match scientific notation with -?\d+(\.\d+)?([eE][+-]?\d+)?. Two gotchas: \d in JavaScript with the u flag matches only ASCII digits, but without the flag it can match digits from other scripts in some implementations, which is rarely what you want. And if you are parsing currency, do not. Strip the currency symbol with a separate pass and then parse the number — trying to handle $1,234.56 in a single regex is the kind of clever that turns into a bug.

Extracting Hex Color Codes

#[0-9a-fA-F]{6}\b for the six-digit form, #[0-9a-fA-F]{3}\b for the three-digit form. Combine with alternation: #(?:[0-9a-fA-F]{6}|[0-9a-fA-F]{3})\b. The word boundary at the end stops the match from spilling into a longer identifier like #abc123def when you only wanted #abc123.

Matching IPv4 Addresses

The lazy version is \b(?:\d{1,3}\.){3}\d{1,3}\b. The strict version requires each octet to be 0-255, which regex can express but ugly: (?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d). For log parsing where you trust the source, use the lazy version and accept that 999.999.999.999 would match. For input validation, do the regex match and then validate each octet with normal code. Regex is the wrong tool for arithmetic bounds-checking and trying to make it do that job is a common source of unreadable patterns.

Splitting CSV-ish Strings

Do not. CSV looks like a regex problem and is not — quoted fields, embedded commas, and escaped quotes break any pattern you can write in one line. Use a proper CSV parser. The same applies to HTML, JSON, and source code in general. Regex matches regular languages, and these formats are not regular.

Replacing With Backreferences

Restructure a date from 2026-06-10 to 06/10/2026 with the pattern (\d{4})-(\d{2})-(\d{2}) and the replacement $2/$3/$1. This is one of the most useful regex tricks and one of the least appreciated by people who only use regex for matching. Any time you need to reshape strings — reformat phone numbers, convert between case styles, swap argument order in a function call across a codebase — captures plus replacement is the right tool.

Flags That Change Everything

JavaScript regex flags are written after the closing slash in literal form (/pattern/gi) or as a second argument to new RegExp(). The four that matter most:

g (global). Without g, the regex finds the first match and stops. With g, it finds all matches. The single most common bug in beginner regex is forgetting the g flag and wondering why replace only fixed the first occurrence. If you are using String.prototype.replaceAll, the g flag is actually required and the engine will throw without it.

i (case-insensitive). Matches without regard to letter case. /hello/i matches "HELLO" and "Hello" and "hElLo." Use it whenever the input case is user-controlled and you do not care about the distinction.

m (multiline). Changes the meaning of ^ and $ to match at the start and end of each line, not just the start and end of the whole string. If you are parsing log lines from a file read in as a single blob, this is the flag you want.

s (dotall). Makes . match newline characters, which it normally does not. Useful for matching across line breaks. Without it, .+ will quietly stop at the first \n, which is occasionally what you want and usually not.

Lookarounds, Briefly

Lookahead ((?=...)) and lookbehind ((?<=...)) are zero-width assertions: they require a pattern to match at a position without consuming characters. They are useful for two real things. First, password validation: (?=.*\d)(?=.*[A-Z])(?=.*[!@#$%]).{8,} requires a digit, an uppercase letter, a symbol, and a minimum length, all without ordering. Second, matching a thing only when it is preceded or followed by something else: (?<=\$)\d+ matches a number only if a dollar sign precedes it, and the dollar sign is not part of the match. Negative versions exist with ! instead of =. JavaScript has had lookbehind in V8 since 2018, but older Node versions and some other JavaScript runtimes lack it. Test before you ship.

The Two Failure Modes to Watch For

Two categories of regex bug cost the most time. They are worth flagging explicitly.

Catastrophic backtracking. When a pattern allows multiple ways to match the same string and the input is engineered (or accidentally happens) to force the engine to try them all, runtime explodes. The canonical example is (a+)+b against a long string of as with no trailing b: the engine tries every possible split of the as before concluding there is no match. In a request handler, this becomes a denial-of-service vector. The defense is to avoid nested quantifiers and to use atomic groups or possessive quantifiers in flavors that support them; in JavaScript, the practical defense is to avoid nested quantifiers and to run any user-supplied regex with a timeout.

The wrong tool entirely. Most of the time someone is debugging a regex for an hour, the right answer was a real parser. HTML, JSON, code, anything with arbitrary nesting — regex cannot handle these correctly because they are not regular languages. The moment a pattern starts growing nested groups and conditional lookarounds, stop. Reach for a parser library. Regex is the right tool for flat, repetitive, line-oriented text. It is the wrong tool for structured data.

When to Stop and Use a Tester

The fastest way to confirm a regex does what you think is to throw it into a tester with real sample input. Reading a pattern in your head and reasoning about whether it matches is a skill, but it is unreliable past about ten characters, and the cost of being wrong in production is much higher than the cost of pasting it into a tool for ten seconds. Any tester that shows you matches, capture groups, and indexes is fine for this; the one I built into this site is intentionally minimal because that is all I want it to do. Confirm the pattern matches what you expect, confirm it does not match what you don't, then ship.

Regex has a reputation for being arcane. It is not — it is a small language with a few real gotchas, and the gotchas are mostly the ones above. Most of what makes regex feel hard is being shown the syntax in the wrong order. Organize your mental model around the tasks you actually do, keep a tester open while you write anything non-trivial, and the rest is pattern-matching in the literal sense.

Related Free Tools

Token CounterEstimate token counts for major LLMs JSON FormatterFormat and validate JSON instantly JWT DecoderDecode JWT tokens safely in your browser

Stay Informed

Get ecosystem updates

New tools, posts, and ecosystem news — no spam, unsubscribe anytime.