I think this link -  https://swtch.com/~rsc/regexp/regexp1.html i- s the best place to start. Superb exposition on the background, theory, and implementation as well as a bit of history of how the industry lost its way with regular expressions.

Regular expressions are beautiful, simple, and widely misunderstood.

-rob


On Sat, Aug 1, 2020 at 10:03 AM Bakul Shah <bakul@iitbombay.org> wrote:
On Jul 31, 2020, at 3:57 PM, Will Senn <will.senn@gmail.com> wrote:
>
> I've always been intrigued with regexes. When I was first exposed to them, I was mystified and lost in the greediness of matches. Now, I use them regularly, but still have trouble using them. I think it is because I don't really understand how they work.
> ...
> 1. What's the provenance of regex in unix (when did it appear, in what form, etc)?
> 2. What are the 'best' implementations throughout unix (keep it pre 1980s)?
> 3. What are some of the milestones along the way (major changes, forks, disagreements)?
> 4. Where, in the source, or in a paper, would you point someone to wanting to better understand the mechanics of regex?

Start here: https://en.wikipedia.org/wiki/Thompson%27s_construction

[I learned about regular expressions in an automata theory class,
 before I knew anything about Unix. What helped me was learning
 about finite state machines. You won't need more than paper and
 pencil to construct one. Reading source code would make more
 sense once you grasp how to construct a FSM corresponding to a RE.]