Lev's blog

Parsing SQL with pg_query

If you ever wanted to read a SQL query and understand what it's doing, you'd just do it. But if you're a computer, you need something called a parser. You can write one yourself, but you need a degree in linguistics (or if you know any dragons, they can help you out).

Fortunately, this is a solved problem so you needn't concern yourself. If you're like me and writing code is best done the hard way, install the Rust compiler and run this1 in your WezTerm2:

cargo add pg_query

This little crate is part Rust, part C, and a whole lot of Postgres. It actually takes its source code directly and wraps it around with some bindings, and voila, you have a fully functional PostgreSQL query parser.

Moving on to an example:

use pg_query::parse;

#[cfg(test)]
mod test {
    use super::*;

    #[test]
    fn test_query_parsing() {
         let ast = parse("SELECT 1")
            .unwrap() // panic on syntax error
            .protobuf // pg_query uses protobuf to encode types
            .stmts // query can have multiple statements, e.g. SELECT 1; SELECT 2;
            .first() // take the first one
            .cloned() // and clone it to avoid lifetimes
            .unwrap() // we are sure the first statement exists
            .stmt // statements have metadata, but we just want the AST
            .unwrap() // statements can be empty, but ours isn't
            .node // it's an Abstract Syntax _Tree_, so let's get the root node
            .unwrap(); // and again, we're sure this Tree has one
        println!("{:#?}", ast);
    }
}

A tree is a large, recursive, structure. Printing it to the terminal won't be pretty, but here it goes:

SelectStmt(
  SelectStmt {
    distinct_clause: [],
    into_clause: None,
    target_list: [
      Node {
        node: Some(
          ResTarget(
            ResTarget {
              name: "",
              indirection: [],
              val: Some(
                Node {
                  node: Some(
                    AConst(
                      AConst {
                        isnull: false,
                        location: 7,
                        val: Some(
                          Ival(
                            Integer {
                              ival: 1,
                            },
                          ),
                        ),
                      },
                    ),
                  ),
                },
              ),
              location: 7,
            },
          ),
        ),
      },
    ],

    /* ... */
  },
)

Most of data structures in pg_query are not documented. That's fine though, since you can experiment with it by parsing different kinds of queries. A lot of the fields have self-explanatory names.

Fun things I learned:

The great thing about "TDD", your tests aren't throwaway code. You can just run them as part of your CI. They will detect any breakage in your dependencies as well.

Happy SQL parsing!

  1. pg_query

  2. I'm a big fan