【MyDB】6-TabelManager 字段与表管理 之2-SQL语句解析
- 前言
- SQL语法
- Parser类具体实现
- 入口方法 Parse(byte[] statement)
- 事务控制
- parseBegin()
- parseCommit() ,parseAbort
- DDL(Data Definition Language)
- parseCreate()
- parseDrop()
- DML 语句
- parseSelect()
- parseInsert()
- parseDelete()
- parseUpdate()
- where子句
- parseWhere
- parseSingleExp()
- 辅助方法
- Tokenizer类具体实现
- 属性
- peek(),pop()
- popByte(),peekByte()
- next()
- nextMetaState()
- `nextQuoteState()`
- 参考资料
前言
对SQL语句的词法解析部分,有点像编译原理的词法分析器
Parser 依赖于Tokenizer实现了对类 SQL 语句(elect,update,insert,drop,begin,abort,commit)的结构化解析,将语句中包含的信息封装为对应语句的类,这些类可见 server.parser.statement 包。
parser 包的 Tokenizer对语句进行逐字节解析,根据空白符或者上述词法规则,将语句切割成多个 token。对外提供peek()和pop()方法,便于取出Token进行解析。
(什么是token呢,举例一个sql语句 select * from tb_user,其中的select * from 等都算 一个token)
SQL语法
<begin statement>begin [isolation level (read committedrepeatable read)]begin isolation level read committed<commit statement>commit<abort statement>abort<create statement>create table <table name><field name> <field type><field name> <field type>...<field name> <field type>[(index <field name list>)]create table studentsid int32,name string,age int32,(index id name)<drop statement>drop table <table name>drop table students<select statement>select (*<field name list>) from <table name> [<where statement>]select * from student where id = 1select name from student where id > 1 and id < 4select name, age, id from student where id = 12<insert statement>insert into <table name> values <value list>insert into student values 5 "Zhang Yuanjia" 22<delete statement>delete from <table name> <where statement>delete from student where name = "Zhang Yuanjia"<update statement>update <table name> set <field name>=<value> [<where statement>]update student set name = "ZYJ" where id = 5<where statement>where <field name> (><=) <value> [(andor) <field name> (><=) <value>]where age > 10 or age < 3<field name> <table name>[a-zA-Z][a-zA-Z0-9_]*<field type>int32 int64 string<value>.*
Parser类具体实现
入口方法 Parse(byte[] statement)
接收 SQL 语句的字节数组,根据首个 Token 分发到对应的解析方法。
对外提供了 Parse(byte[] statement)
方法,核心就是一个调用 Tokenizer 类分割 Token,并根据词法规则包装成具体的 Statement 类并返回。解析过程很简单,仅仅是根据第一个 Token 来区分语句类型,并分别处理,不再赘述。
/*** 使用Tokenizer解析字节数组为语句* @param statement* @return* @throws Exception*/public static Object Parse(byte[] statement) throws Exception {Tokenizer tokenizer = new Tokenizer(statement);String token = tokenizer.peek();// 获取第一个tokentokenizer.pop(); // 弹出第一个tokenObject stat = null;Exception statErr = null;// 根据第一个token判断语句类型,并调用不同的方法来解析语句try {switch(token) {case "begin":stat = parseBegin(tokenizer);break;case "commit":stat = parseCommit(tokenizer);break;case "abort":stat = parseAbort(tokenizer);break;case "create":stat = parseCreate(tokenizer);break;case "drop":stat = parseDrop(tokenizer);break;case "select":stat = parseSelect(tokenizer);break;case "insert":stat = parseInsert(tokenizer);break;case "delete":stat = parseDelete(tokenizer);break;case "update":stat = parseUpdate(tokenizer);break;case "show":stat = parseShow(tokenizer);break;default:throw Error.InvalidCommandException;}} catch(Exception e) {statErr = e;}try {// 检查Tokenizer是否还有剩余的token,如果有则抛出异常String next = tokenizer.peek();if(!"".equals(next)) {byte[] errStat = tokenizer.errStat();statErr = new RuntimeException("Invalid statement: " + new String(errStat));}} catch(Exception e) {e.printStackTrace();byte[] errStat = tokenizer.errStat();statErr = new RuntimeException("Invalid statement: " + new String(errStat));}if(statErr != null) {throw statErr;}return stat;}
事务控制
parseBegin():解析 BEGIN [ISOLATION LEVEL READ COMMITTED/REPEATABLE READ]。
parseCommit() 和 parseAbort():处理简单的提交和回滚。
parseBegin()
解析begin语句,语法为begin [ISOLATION LEVEL READ COMMITTED/REPEATABLE READ]
// 解析begin语句,语法为begin [ISOLATION LEVEL READ COMMITTED/REPEATABLE READ]private static Begin parseBegin(Tokenizer tokenizer) throws Exception {String isolation = tokenizer.peek();Begin begin = new Begin();if("".equals(isolation)) {return begin;}if(!"isolation".equals(isolation)) {throw Error.InvalidCommandException;}tokenizer.pop();String level = tokenizer.peek();if(!"level".equals(level)) {throw Error.InvalidCommandException;}tokenizer.pop();String tmp1 = tokenizer.peek();if("read".equals(tmp1)) {tokenizer.pop();String tmp2 = tokenizer.peek();if("committed".equals(tmp2)) {tokenizer.pop();if(!"".equals(tokenizer.peek())) {throw Error.InvalidCommandException;}return begin;} else {throw Error.InvalidCommandException;}} else if("repeatable".equals(tmp1)) {tokenizer.pop();String tmp2 = tokenizer.peek();if("read".equals(tmp2)) {begin.isRepeatableRead = true;tokenizer.pop();if(!"".equals(tokenizer.peek())) {throw Error.InvalidCommandException;}return begin;} else {throw Error.InvalidCommandException;}} else {throw Error.InvalidCommandException;}}
parseCommit() ,parseAbort
解析事务,语法为commit/abort
// 解析Abort语句,语法为ABORTprivate static Abort parseAbort(Tokenizer tokenizer) throws Exception {if(!"".equals(tokenizer.peek())) {throw Error.InvalidCommandException;}return new Abort();}// 解析简单的commitprivate static Commit parseCommit(Tokenizer tokenizer) throws Exception {if(!"".equals(tokenizer.peek())) {throw Error.InvalidCommandException;}return new Commit();}
DDL(Data Definition Language)
parseCreate():解析 CREATE TABLE,包括字段名、类型和索引。例如:
CREATE TABLE 表名 字段1 类型1, 字段2 类型2 (INDEX 索引字段)
parseDrop() :解析 DROP TABLE 表名。
parseCreate()
语法为:CREATE TABLE 表名 字段1 类型1, 字段2 类型2 … (INDEX 索引字段1, 索引字段2)
/*** 解析创建table语句,包括字段名,类型索引* eg:CREATE TABLE 表名 字段1 类型1, 字段2 类型2 ... (INDEX 索引字段1, 索引字段2)* @param tokenizer* @return* @throws Exception*/private static Create parseCreate(Tokenizer tokenizer) throws Exception {if(!"table".equals(tokenizer.peek())) {throw Error.InvalidCommandException;}tokenizer.pop();Create create = new Create();String name = tokenizer.peek();//1.获取表名if(!isName(name)) {throw Error.InvalidCommandException;}create.tableName = name;List<String> fNames = new ArrayList<>();List<String> fTypes = new ArrayList<>();// 2.解析字段名和类型while(true) {tokenizer.pop();String field = tokenizer.peek();// 如果遇到"(index",则跳出循环if("(".equals(field)) {break;}if(!isName(field)) {throw Error.InvalidCommandException;}tokenizer.pop();String fieldType = tokenizer.peek();if(!isType(fieldType)) {throw Error.InvalidCommandException;}fNames.add(field);fTypes.add(fieldType);tokenizer.pop();String next = tokenizer.peek();if(",".equals(next)) {continue;} else if("".equals(next)) {throw Error.TableNoIndexException;} else if("(".equals(next)) {break;} else {throw Error.InvalidCommandException;}}create.fieldName = fNames.toArray(new String[fNames.size()]);create.fieldType = fTypes.toArray(new String[fTypes.size()]);tokenizer.pop();if(!"index".equals(tokenizer.peek())) {throw Error.InvalidCommandException;}// 3.解析索引List<String> indexes = new ArrayList<>();while(true) {tokenizer.pop();String field = tokenizer.peek();if(")".equals(field)) {break;}if(!isName(field)) {throw Error.InvalidCommandException;} else {indexes.add(field);}}create.index = indexes.toArray(new String[indexes.size()]);tokenizer.pop();if(!"".equals(tokenizer.peek())) {throw Error.InvalidCommandException;}return create;}
parseDrop()
解析删除语句,语法为drop table 表名
/*** 解析删除语句,语法为drop table 表名* @param tokenizer* @return* @throws Exception*/private static Drop parseDrop(Tokenizer tokenizer) throws Exception {if(!"table".equals(tokenizer.peek())) {throw Error.InvalidCommandException;}tokenizer.pop();// 1.获取表名String tableName = tokenizer.peek();if(!isName(tableName)) {throw Error.InvalidCommandException;}tokenizer.pop();// 2.判断是否还有剩余的token,如果有则抛出异常if(!"".equals(tokenizer.peek())) {throw Error.InvalidCommandException;}Drop drop = new Drop();drop.tableName = tableName;return drop;}
DML 语句
parseSelect():解析 SELECT 语句,支持字段列表或 *,FROM 表名和可选的 WHERE 条件。
parseInsert():解析 INSERT INTO 表名 VALUES 值列表。
parseDelete():解析 DELETE FROM 表名 WHERE 条件。
parseUpdate():解析 UPDATE 表名 SET 字段=值 WHERE 条件。
parseSelect()
语法为SELECT 字段列表/* FROM 表名 [WHERE 条件]
/*** 解析select语句,语法为SELECT 字段列表/* FROM 表名 [WHERE 条件]* @param tokenizer* @return* @throws Exception*/private static Select parseSelect(Tokenizer tokenizer) throws Exception {Select read = new Select();// 1.获取字段列表List<String> fields = new ArrayList<>();String asterisk = tokenizer.peek();// 1.1 如果是*,则直接返回if("*".equals(asterisk)) {fields.add(asterisk);tokenizer.pop();} else {// 1.2 如果不是*,则需要解析字段列表while(true) {String field = tokenizer.peek();if(!isName(field)) {throw Error.InvalidCommandException;}fields.add(field);tokenizer.pop();if(",".equals(tokenizer.peek())) {tokenizer.pop();} else {break;}}}read.fields = fields.toArray(new String[fields.size()]);if(!"from".equals(tokenizer.peek())) {throw Error.InvalidCommandException;}tokenizer.pop();// 2.获取表名String tableName = tokenizer.peek();if(!isName(tableName)) {throw Error.InvalidCommandException;}read.tableName = tableName;tokenizer.pop();// 3.解析where条件String tmp = tokenizer.peek();if("".equals(tmp)) {read.where = null;return read;}read.where = parseWhere(tokenizer);return read;}
parseInsert()
解析 INSERT INTO 表名 VALUES 值列表。
注意,这里的插入不需要指定插入的字段,而是只需要指定插入的表名,因此需要插入全部字段。
eg:table tb_user(long id,string name)
那么,插入为insert into tb_user values 1,hx
而不能够 insert into tb_user values hx
/*** 解析insert语句,语法为INSERT INTO 表名 VALUES 值列表* 这里不涉及字段,而是需要全部插入* @param tokenizer* @return* @throws Exception*/private static Insert parseInsert(Tokenizer tokenizer) throws Exception {Insert insert = new Insert();if(!"into".equals(tokenizer.peek())) {throw Error.InvalidCommandException;}tokenizer.pop();// 1.获取表名String tableName = tokenizer.peek();if(!isName(tableName)) {throw Error.InvalidCommandException;}insert.tableName = tableName;tokenizer.pop();if(!"values".equals(tokenizer.peek())) {throw Error.InvalidCommandException;}// 2.获取值列表List<String> values = new ArrayList<>();while(true) {tokenizer.pop();String value = tokenizer.peek();if("".equals(value)) {break;} else {values.add(value);}}insert.values = values.toArray(new String[values.size()]);return insert;}
parseDelete()
解析 DELETE FROM 表名 WHERE 条件。
/*** 解析delete语句,语法为 DELETE FROM 表名 WHERE 条件* @param tokenizer* @return* @throws Exception*/private static Delete parseDelete(Tokenizer tokenizer) throws Exception {Delete delete = new Delete();if(!"from".equals(tokenizer.peek())) {throw Error.InvalidCommandException;}tokenizer.pop();// 1.获取表名String tableName = tokenizer.peek();if(!isName(tableName)) {throw Error.InvalidCommandException;}delete.tableName = tableName;tokenizer.pop();// 2.解析where条件delete.where = parseWhere(tokenizer);return delete;}
parseUpdate()
解析 UPDATE 表名 SET 字段=值 WHERE 条件。
/*** 解析update语句,语法为 UPDATE 表名 SET 字段=值 WHERE 条件* @param tokenizer* @return* @throws Exception*/private static Update parseUpdate(Tokenizer tokenizer) throws Exception {Update update = new Update();// 1.获取表名update.tableName = tokenizer.peek();tokenizer.pop();// 2.判断是否为set,如果不是则抛出异常if(!"set".equals(tokenizer.peek())) {throw Error.InvalidCommandException;}tokenizer.pop();// 3.获取字段名update.fieldName = tokenizer.peek();tokenizer.pop();// 4.判断是否为等号,如果不是则抛出异常if(!"=".equals(tokenizer.peek())) {throw Error.InvalidCommandException;}tokenizer.pop();// 5.获取字段值update.value = tokenizer.peek();tokenizer.pop();// 6.判断是否为where,如果不是则返回update对象String tmp = tokenizer.peek();if("".equals(tmp)) {update.where = null;return update;}// 7.判断是否为where,如果不是则抛出异常update.where = parseWhere(tokenizer);// 8.返回update对象return update;}
where子句
parseWhere():解析条件表达式,支持单个条件或通过 AND/OR 连接的两个条件。
parseSingleExp():处理形如 字段 操作符 值 的单个条件,支持 =, >, < 比较符。
parseWhere
解析条件表达式,支持单个条件或通过 AND/OR 连接的两个条件。
/*** 解析条件表达式,支持单个条件或通过 AND/OR 连接的两个条件。* @param tokenizer* @return* @throws Exception*/private static Where parseWhere(Tokenizer tokenizer) throws Exception {Where where = new Where();if(!"where".equals(tokenizer.peek())) {throw Error.InvalidCommandException;}tokenizer.pop();// 1.解析第一个条件表达式SingleExpression exp1 = parseSingleExp(tokenizer);where.singleExp1 = exp1;// 2.解析逻辑运算符String logicOp = tokenizer.peek();if("".equals(logicOp)) {where.logicOp = logicOp;return where;}if(!isLogicOp(logicOp)) {throw Error.InvalidCommandException;}where.logicOp = logicOp;tokenizer.pop();// 3.解析第二个条件表达式SingleExpression exp2 = parseSingleExp(tokenizer);where.singleExp2 = exp2;if(!"".equals(tokenizer.peek())) {throw Error.InvalidCommandException;}return where;}
parseSingleExp()
处理形如 字段 操作符 值 的单个条件,支持 =, >, < 比较符。
/*** 处理如字段 操作符 值 的表达式,支持=,>,< 比较符* eg: id > 1* @param tokenizer* @return* @throws Exception*/private static SingleExpression parseSingleExp(Tokenizer tokenizer) throws Exception {SingleExpression exp = new SingleExpression();// 1.获取字段名String field = tokenizer.peek();if(!isName(field)) {throw Error.InvalidCommandException;}exp.field = field;tokenizer.pop();// 2.获取比较符String op = tokenizer.peek();if(!isCmpOp(op)) {throw Error.InvalidCommandException;}exp.compareOp = op;tokenizer.pop();// 3.获取值exp.value = tokenizer.peek();tokenizer.pop();return exp;}// 判断是否为合法数学运算符,即>=<private static boolean isCmpOp(String op) {return ("=".equals(op) || ">".equals(op) || "<".equals(op));}// 判断是否为逻辑运算符,and,orprivate static boolean isLogicOp(String op) {return ("and".equals(op) || "or".equals(op));}
辅助方法
isName():检查是否为合法标识符(非单个非字母字符)。
isType():验证字段类型(int32, int64, string)。
isCmpOp() 和 isLogicOp():校验比较和逻辑运算符。
这些逻辑比较简单,便不再详细阐述
Tokenizer类具体实现
Tokenizer刚刚提到过,核心方法在于peek()
和pop().
Tokenizer的作用是将输入字节流分解为token,处理空白、字符串引号、符号和标识符,供Parser进一步解析。需要确保各个状态转换正确,错误处理合理,与Parser的预期token格式匹配。
属性
public class Tokenizer {private byte[] stat; // 字节数组保存当前解析的语句private int pos; // 当前解析到的位置private String currentToken; // 保存当前解析到的tokenprivate boolean flushToken; // 是否需要刷新tokenprivate Exception err; // 解析过程中产生的异常
}
peek(),pop()
peek()方法会调用next()
来获取下一个token,并将其缓存在currentToken中,直到pop()被调用,flushToken置为true,下次peek()时会重新获取新token。
/*** 预读下一个token* @return* @throws Exception*/public String peek() throws Exception {if(err != null) {throw err;}if(flushToken) {String token = null;try {token = next();} catch(Exception e) {err = e;throw e;}currentToken = token;flushToken = false;}return currentToken;}/*** 消费当前tokne,设置flushToken为true,下次调用peek()时会重新解析下一个token*/public void pop() {flushToken = true;}
popByte(),peekByte()
/**** 弹出一个字节*/private void popByte() {pos ++;if(pos > stat.length) {pos = stat.length;}}// 判断是否为空白字符private Byte peekByte() {if(pos == stat.length) {return null;}return stat[pos];}
next()
刚刚说过,peek()方法会调用next()方法来获取下一个token,对于next()方法来说,具体则是会调用nextMetaState()
方法实现这一点
// 解析下一个tokenprivate String next() throws Exception {if(err != null) {throw err;}return nextMetaState();}
nextMetaState()
是核心方法,跳过空白字符,根据下一个字符的类型决定进入哪种状态:
符号、引号字符串或普通token。
- 符号直接返回单个字符,
- 引号内的内容由
nextQuoteState()
处理, - 普通token(由字母、数字、下划线组成)由
nextTokenState()
处理。
// 解析的入口,负责根据当前字符的类型分派到不同的处理函数。private String nextMetaState() throws Exception {// 1.先跳过空白字符while(true) {Byte b = peekByte();if(b == null) {return "";}if(!isBlank(b)) {break;}popByte();}byte b = peekByte();// 2.根据当前字符类型分派到不同的处理函数if(isSymbol(b)) {popByte();return new String(new byte[]{b});} else if(b == '"' || b == '\'') {return nextQuoteState(); // 处理引号内的内容} else if(isAlphaBeta(b) || isDigit(b)) {return nextTokenState(); //nextTokenState处理标识符或关键字} else {err = Error.InvalidCommandException;throw err;}}
nextQuoteState()
解析引号内的token
/*** 引号字符串解析* @return* @throws Exception*/private String nextQuoteState() throws Exception {byte quote = peekByte();popByte();StringBuilder sb = new StringBuilder();while(true) {Byte b = peekByte();if(b == null) {err = Error.InvalidCommandException;throw err;}if(b == quote) {popByte();break;}sb.append(new String(new byte[]{b}));popByte();}return sb.toString();}
nextTokenState()
解析普通token
/*** 解析标识符或关键字,字母数字下划线* @return* @throws Exception*/private String nextTokenState() throws Exception {StringBuilder sb = new StringBuilder();while(true) {Byte b = peekByte();if(b == null || !(isAlphaBeta(b) || isDigit(b) || b == '_')) {if(b != null && isBlank(b)) {popByte();}return sb.toString();}sb.append(new String(new byte[]{b}));popByte();}}
参考资料
MYDB 9. 字段与表管理 | 信也のブログ (shinya.click)
字段与表管理 | EasyDB (blockcloth.cn)