ScratchData LogoScratchData
Back to NickyNouse's profile

[Re-upload] RegEng: A Better Regular Expression Engine

NINickyNouse•Created December 12, 2022
[Re-upload] RegEng: A Better Regular Expression Engine
26
22
562 views
View on Scratch

Instructions

Simply enter the RegEx and the String to match against! Don't bookend the regex in /slashes/. Supported features: cAsE sEnSiTiVe [BROKEN IN SCRATCH 3.0] [aA-M] matches a, or any character between A and M [^abcd] matches everything but a, b, c, or d . matches any character \d for a digit (\D for everything else) \w for alphanumeric and _ (\W for everything else) \s for whitespace (just space and tab) (\S for everything else) \t for tab (\T for everything else) a slash before any other character escapes it () to group elements * to repeat the previous token 0 or more times + to repeat the previous token 1 or more times ? to repeat the previous token 0 or 1 times ^ at the beginning to match from the start of the string $ at the end to match to the end of the string

Description

RE-UPLOAD: the original version of this stopped working in 2019 because of a couple hacked blocks that aren't supported by Scratch 3.0. I was able to open this in Tosh and figure out how to remove the offending blocks. Case sensitivity seems to be broken now. ====== This is a Regular Expression (RegEx or RegExp) engine based on Thompson NFA. Regular Expressions are a way to describe a string you want to search for in another piece of text. I wrote a bad RegEx engine a while ago, but this time I did it right. It's got all the stuff. Ranges, order of operations, you name it. Feel free to use this in your project, but please give credit! (and let me know, I'm excited to see what people do with this!) Also please report any bugs you find! The algorithm I've implemented here, Thompson NFA, is crazy fast at so-called "pathological" expressions like /a?a?aaa/ or /(x+x+)+y/. (see https://www.regular-expressions.info/catastrophic.html). That's because an NFA (Nondeterministic Finite Automaton) is the "pure" way to do RegEx. There are literally cases (albeit contrived ones) where this Scratch project can do a search faster than Perl by millions of years (see https://swtch.com/~rsc/regexp/regexp1.html). That's all at the expense of newer features like backrefrences (/([a-c])x\1/), which can only be implemented using backtracking (the "impure" way to do RegEx). It also doesn't have repeat counts (/a{5}/) but mostly because I didn't bother to add it. It tries to be greedy but there are times it could be greedier -- there are a bunch of edge-cases here that might not be as greedy as you want them to. Experiment with it, and as always, when in doubt add more parentheses. Also I'm proud to say that this doesn't use that many hacked blocks, so you don't have to worry about dozens of throwaway lists and variables! These were a big help: - https://swtch.com/~rsc/regexp/regexp1.html - https://gist.github.com/gmenard/6161825 - https://dannysu.com/2015/10/31/regex-nfa/ (the last 2 links are just ports of the first one to languages I'm better at reading) PS. I didn't find this 'till after I posted this project but it's a real good read by a real good Scratcher: https://hardmath123.github.io/monster-match.html

Project Details

Project ID776082206
CreatedDecember 12, 2022
Last ModifiedDecember 13, 2022
SharedDecember 12, 2022
Visibilityvisible
CommentsAllowed