A Basic Compiler – A Great Refactoring Opportunity

Years ago (2006) I wrote a compiler for TI BASIC, the dialect of BASIC that I learned on my TI 99/4A (many many years earlier). This is an “ancient” computer language and one of the first that I learned. (I had a few years of experience with Apple Basic on Apple IIe before getting a TI.)

I was reminded of this project tonight when I added my github repositories to my LinkedIn profile. There is now a slightly larger chance that someone might actually see this code. Which leads me to my point: there is a lot of bad code in that repository. I hadn’t really seen the “refactoring light” at that point in my career and there are large chunks of that code base that are badly in need of it.

The one saving grace is that there are actual tests included with the project. I took code examples from the reference manual and typed them in as separate programs.  I also coded up the expected output from each program in python scripts. I then have a Makefile that compiles each sample program, runs it, and pipes the output to the python script that checks that each line of output matches what is expected.

So this means that I can be fairly confident in refactoring some of this code 6 years later. I hardly know where to start however. I think the best thing to do is to just dive into the Main method and starting performing lots of ExtractMethod.

Switching Topics – About BASIC Badness

Grade school and high school gave me about 9 years of programming in BASIC. I arrived at college with years of programming experience and yet had never heard of a compiler. I recall snickering to myself about how archaic any language must be that requires a compiler (a whole separate tool) before it could be run. I couldn’t fathom the purpose of such a tool. (At this time we didn’t have the plethora of scripting languages that also have this great feature of my original BASIC.)

I started learning about “structured” programming as it was called at the time. I didn’t learn any object oriented languages (they were fairly new and not taught). I learned Pascal and the C. We learned not OO but how to use structured constructs. Again, I snickered to myself. What could this terminology mean.

Well, any way, I learned how to program in C and Pascal and Fortran, and then later in life on jobs I learned C++, Visual Basic, and eventually C#, Java and others. I became so comfortable with my new languages and I don’t recall there every being a moment when I realized the code I was writing was completely different from BASIC.

Later in life I heard many of the famous quotes by Edsger W. Dijkstra like

It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration.

I didn’t really understand what was meant. It had been years since I had studied/used BASIC but couldn’t recall it being bad. Well, when I started the compiler project that is the topic of this post I got to become reacquainted with BASIC and see it in a fresh light. It is truly frightening and I wonder how I ever got any program to work. Here is just one small example program (no indentation, no “structure”, no white space, no symbolic names for objects or methods to help explain what is going on):

[VB]
100 REM
101 REM
108 DIM STACK$(100)
109 REM
110 GOSUB 1000
130 INPUT “String: “:STRING$
140 FOR I = 1 TO LEN(STRING$)
150 CHAR$ = SEG$(STRING$, I, 1)
151 STACKVAL$ = CHAR$
155 LEFTBRACKET = (CHAR$=”(“) + (CHAR$=”[“) + (CHAR$=”{“)
156 RIGHTBRACKET = (CHAR$=”)”) + (CHAR$=”]”) + (CHAR$=”}”)
160 IF LEFTBRACKET THEN 180
170 IF RIGHTBRACKET THEN 190
175 GOTO 250
180 GOSUB 2000
185 GOTO 250
190 GOSUB 3000
195 MATCH = ((STACKVAL$=”(“)*(CHAR$=”)”) + (STACKVAL$=”[“)*(CHAR$=”]”) + (STACKVAL$=”{“)*(CHAR$=”}”))
200 IF MATCH=0 THEN 300
250 NEXT I
260 GOSUB 4000
270 IF STACKCOUNT<>0 THEN 300
280 PRINT “Match”
290 GOTO 301
300 PRINT “No match detected at pos “&STR$(I)
301 INPUT “Another String (Y or N): “:AGAIN$
302 IF (AGAIN$=”Y”) THEN 110
310 END
999 REM
1000 STACKIDX = -1
1010 STACKVAL$ = “”
1020 RETURN
1999 REM
2000 STACKIDX = STACKIDX + 1
2010 STACK$(STACKIDX) = STACKVAL$
2020 RETURN
3000 REM
3010 REM
3020 REM
3030 IF (STACKIDX > -1) THEN 3060
3040 STACKVAL$ = “”
3050 GOTO 3080
3060 STACKVAL$ = STACK$(STACKIDX)
3070 STACKIDX = STACKIDX – 1
3080 RETURN
4000 REM
4010 REM
4020 REM
4030 STACKCOUNT = STACKIDX + 1
4040 RETURN
[/VB]

Actually, this isn’t nearly as bad as it was “back in the day”. I turned on a little syntax highlighting. Also, before I copied this code I removed the REMARKS and just left placeholders for where they should go. Also, I actually did try to structure this program. Try to see if you can figure out what it is doing and then look at this version with remarks and a little white space.

[VB]

100 REM Checks a string for matching brackets
101 REM ‘(‘, ‘)’, ‘[‘, ‘]’, ‘{‘, ‘}’

108 DIM STACK$(100)

109 REM Initialize Stack
110 GOSUB 1000
130 INPUT “String: “:STRING$

140 FOR I = 1 TO LEN(STRING$)
150 CHAR$ = SEG$(STRING$, I, 1)
151 STACKVAL$ = CHAR$
155 LEFTBRACKET = (CHAR$=”(“) + (CHAR$=”[“) + (CHAR$=”{“)
156 RIGHTBRACKET = (CHAR$=”)”) + (CHAR$=”]”) + (CHAR$=”}”)
160 IF LEFTBRACKET THEN 180
170 IF RIGHTBRACKET THEN 190
175 GOTO 250
180 GOSUB 2000
185 GOTO 250
190 GOSUB 3000
195 MATCH = ((STACKVAL$=”(“)*(CHAR$=”)”) + (STACKVAL$=”[“)*(CHAR$=”]”) + (STACKVAL$=”{“)*(CHAR$=”}”))
200 IF MATCH=0 THEN 300
250 NEXT I
260 GOSUB 4000
270 IF STACKCOUNT<>0 THEN 300
280 PRINT “Match”
290 GOTO 301
300 PRINT “No match detected at pos “&STR$(I)
301 INPUT “Another String (Y or N): “:AGAIN$
302 IF (AGAIN$=”Y”) THEN 110
310 END

999 REM SUBROUTINE 1000 Initializes the stack to be empty
1000 STACKIDX = -1
1010 STACKVAL$ = “”
1020 RETURN

1999 REM This subroutine pushes STACKVAL$ onto stack
2000 STACKIDX = STACKIDX + 1
2010 STACK$(STACKIDX) = STACKVAL$
2020 RETURN

3000 REM This subroutine pops a value off of the stack
3010 REM and puts it into variable STACKVAL$. If the
3020 REM stack is empty then STACKVAL$ will get the empty string
3030 IF (STACKIDX > -1) THEN 3060
3040 STACKVAL$ = “”
3050 GOTO 3080
3060 STACKVAL$ = STACK$(STACKIDX)
3070 STACKIDX = STACKIDX – 1
3080 RETURN

4000 REM This routine updates STACKCOUNT with the number
4010 REM of items in the STACK. STACKCOUNT is only reliable
4020 REM if this subroutine is called before inspecting it.
4030 STACKCOUNT = STACKIDX + 1
4040 RETURN

[/VB]

It was a lot of fun learning to program in TI BASIC, but boy I sure don’t miss it.

About Michael

I am a software developer and part-time professor. I enjoy studying and discussing mathematics, computer science and software development.
This entry was posted in software development and tagged , , , . Bookmark the permalink.

Leave a Reply