In the land of programming, text is king, so tools like grep, sed, and awk are fantastic companions to your compiler or interpreter. Every programmer’s toolbox can also benefit from a text-based version control system, like Git.
But these tools all work at the generic text level. While regular expressions can be incredibly powerful, they’re still not a perfect fit for the grammar of a programming language. For meaningful, semantic search, you need a different type of tool. Enter ast-grep.
What is ast-grep, and why would you need it?
Super-powered search that understands your code
Let’s say you want a list of all the function names in your codebase. A first attempt might look something like this:
grep -r 'function' ...
And then you might realize that you’re matching far too many irrelevant occurrences, so you refine your regular expression:
egrep -r '^function\W' ...
You continue in this fashion until, eventually, you realize that, no matter what you do, your search is going to match something like:
/*
The following should be:
function foo(a, b, c)
but that’s not going to work because...
*/
Tools like grep simply don’t understand the context of something that looks like a function declaration inside a comment. They work just by matching individual characters, not by considering any wider meaning that combinations of those characters may embody.
ast-grep is different. Instead of reading text character by character, line by line, it parses the text (just like a compiler) and builds an abstract syntax tree (AST) which represents your code’s actual meaning. Using that, it can then carry out semantic searches for elements like variable declarations, function calls, and so on.
Under the hood, ast-grep uses the popular Tree-sitter library, which lets it support a wide range of languages as diverse as Python, Java, and Go. ast-grep is written in Rust, so it performs well, even searching across large codebases. This is important because, since it has to fully parse files before searching them, there’s a lot more work to do than a standard text-based search.
How to search for almost anything using ast-grep
Using the command line or a web app that’s perfect for beginners
The simplest use of ast-grep uses the -p option to specify a single simple pattern.
ast-grep -p 'console.log'
Without any path arguments, ast-grep will search for files in the current directory, recursively. Unless you specify a language, ast-grep will infer it based on a file’s extension. The program will group results by file, printing details of each match, highlighting relevant parts:
Note that this search returns function calls with no arguments and a reference to the method property that isn’t even a call. However, it specifically looks for an object named console, so it won’t find the following:
with (console) {
log("Hello, world.");
}
But it will, correctly, ignore “‘console.log'” as a string and “/* console.log */” as a comment. At this point, I advise you to explore the ast-grep playground, a web app that runs ast-grep in the background and presents its results:
As you can see, the playground clearly indicates exactly what matched and what didn’t. This is in contrast to the command-line program, which filters input and only returns matches. I found the playground very useful when first learning about ast-grep and trying to build my own patterns.
You can use meta variables to match more dynamic content. A meta variable begins with a $ and uses only uppercase letters, along with underscores and digits. To find all calls to console.log with a single argument, use this pattern:
console.log($SINGLE_ARG)
Notice that the pattern matches calls with a single argument, not those without any or with more than one:
If you want to match multiple arguments, you can use a multi meta variable, which begins with three dollar signs:
console.log($$$MULTIPLE_ARGS)
This pattern will match all calls to console.log, including those with zero or several arguments:
For more complex searches, ast-grep supports a rule syntax in YAML. You can use this to define highly-contextual searches, making full use of the AST structure.
For example, here’s a rule that matches all calls to console.log, console.debug, and console.warn, plus calls to console.error if they are not inside a catch clause:
id: no-console-except-error
language: typescript
rule:
any:
- pattern: console.error($$$)
not:
inside:
kind: catch_clause
stopBy: end
- pattern: console.$METHOD($$$)
constraints:
METHOD:
regex: 'log|debug|warn'
Change your code without manually editing it
With careful preparation and testing, ast-grep can be a powerful automated editor
ast-grep is already a powerful tool, but it goes beyond search. With its replace feature, ast-grep lets you modify your code, too. This means you can make incredibly time-consuming changes in seconds, either automatically or interactively.
Consider the previous console.log example. Imagine one of your codebases that’s riddled with debug code that calls console.log. You decide you want something a bit more sophisticated, so you start with a bespoke function, my_logger:
function my_logger() {
if (GLOBAL_DEBUG) {
console.log.apply(null, arguments)
}
}
It’s just a simple wrapper to console.log, with a global flag, for now. But the point is that you’ll need to convert each console.log() call to a my_logger one. Using ast-grep, this is a simple process. Start with a rule file, e.g., fix-logging.yml:
rule:
pattern: console.log($$$MULTIPLE_ARGS)
fix: my_logger($$$MULTIPLE_ARGS)
Then run it like this to see the changes that would be made:
ast-grep scan --rule fix-logging.yml
If you’re happy with the results, you can get ast-grep to make these changes for you, updating files in-place with the –update-all option:
ast-grep scan --rule fix-logging.yml --update-all
There are various modes you can use to make changes, including an interactive one, which is probably the most secure way of running ast-grep while you’re getting to grips with it.
Despite the steep learning curve, mastering ast-grep will reward
There’s no doubt about it: ast-grep is a complicated program. The examples I’ve included here are at the easier end of the scale, and if you dig into the documentation further, you’ll see this tool has a lot more to offer.
While it can be daunting, the playground helps a lot, and using interactive mode to clean up your codebase is a great way to start learning ast-grep.
