Script Lexing Posts

Alex Angelopoulos (aka at mvps dot org)

This is a summary of my posts to the scripting newsgroups on the subject of self-lexing scripts.

Stripping VBScript prior to tokenizing

Date: 2002-11-14 18:30:04 PST

An initial attempt at processing script via script, using regular expressions on the entire script to pre-process, then going through and "normalizing" lines of code.

Script Cooker

Date: 2002-11-22 06:22:58 PST

Second iteration of the ideas from 11-14, packaged into a component and able to "code mine" to extract functions from script, do crude pretty-printing, etc.

Script Parsing Part 1: The Way to NOT do it.

Date: 2002-12-13 15:54:02 PST

I'm good at posts like this!;)

General summation of some of the issues and ideas I had in dealing with VBScript as a text unit. Looking back, I think this post marks when I was just beginning to realize the importance of dealing with script on an atomic level, as a character stream, as opposed to any larger intermediary units.

Script Parsing Part 2: Brief Discussion of State Machines

Date: 2002-12-14 11:23:48 PST

Arm-waving of "this is what a state machine is all about". Simplistic.

Script Parsing Part 3: Simplistic Finite Machine State Machine for VBScript

Date: 2002-12-17 11:46:49 PST

Examples for the prior post. I started with a `degenerate' state machine which had input and output but was single-state. Followed with a 2-state machine and a lightly commented FSM for parsing script, derived from the `Script Cooker'.

Includes links to some web resources. I've changed my mind about the merit of some, particularly Libero...

Finite State Machine based VBScript Parser

Date: 2002-12-22 11:45:00 PST

This is actually a lexer I believe, NOT a parser - and I did it the hard way, reading character by character and building tokens. Implemented as a class "fsmParser". It reads a character "stream"; it emits tokens to a stack; when it hits the end of a statement, it emits a complete statement as a token list (actually an array).

Does not handle some niceties such as date expressions very well. In terms of practical value, it is probably much less useful than `Script Cooker'. As theory, some people may find it useful for FSM concepts, but I was still very unclear on the idea so even that is doubtful.

Code Parsing Redux - stateless regex-based subtractive tokenizing

Date: 2003-02-27 11:52:54 PST

I have no idea what to call this. This is almost pure code - a regex-based direct parser. Instead of switching states and looking at input, it uses a single master state which checks for conformance to a set of patterns.