An Open Pre-compiler for Embedded SQL

1.Problem Statement

Our system includes approximately 150 modules that use embedded SQL. In the beginning, we were interested in testing the system against various types of databases, including commercial and open-source, as part of a Linux port. Some of the databases that we tested, such as MySQL, do not include a pre-compiler for ESQL. Rewriting these modules to the many vendor APIs was not reasonable, so we decided to create the pre-compiler we needed.

The pre-compiler is an awk script with different versions for MySQL, ODBC, Oracle and PostgreSQL databases. NonStop SQL/MP databases will require specific coding and minor changes to enable them to port from other database systems.

2.Embedded SQL

i. Sample Program

The sample program below selects rows from a sample table and prints them. To maintain the clarity of the sample program, error handling is not included. The pre-compiler approach is designed for C/C++, COBOL, and PLI, but this example will only use C/C++. For each statement, we will show the MySQL and the ODBC coding.

ii. Declaring Host Variables

The programmer of the database API is usually required to specify the type of host variable. The pre-compiler generally needs to parse the declarations in order to determine the types. Some databases are limited to simple declarations, while others allow structures, typedefs, etc. into the declare section. Our pre-compiler permitted these more complex declarations, but we did not want to completely parse the C structures. We realized that we could generate C++ code from the initial C and ESQL code and write a light-weight wrapper around the database API that uses polymorphism to ensure that the correct types are used. The C++ compiler will then simplify the process. The awk script simply takes the declare section and comments out the EXEC SQL macros for MySQL and ODBC databases.

iii. Declaring a SELECT Statement

The pre-compiler creates a static variable with the text of the SQL statement. Additional terms are included to determine whether the statement has been prepared or not. The MySQL code is as follows: [insert code]. For ODBC, it is almost identical with minor changes, such as replacing the host variables with a question mark (“?”) instead of the names: [insert code].

iv. Executing a SELECT Statement

When the program opens the ESQL cursor, the SELECT statement is executed: [insert code]. The first step is to ensure a valid connection to the database. Systems that used pre-compiled and bound SQL have implicit connections. For ODBC, the statement is prepared and the host variables bound before execution.

v. Fetching Data

The FETCH statement specifies the target variables for the data: [insert code]. Note that the specification of the types of variables is not necessary when fetching from the cursor. Because we used polymorphism, the C++ compiler is able to determine the correct function to call: [insert code].

vi. Closing the SELECT Statement

Closing the cursor is the simplest step of all: [insert code]. This step translates to the following MySQL code: [insert code].

vii. Modified Code

At this point, the EXEC SQL statements are replaced by calls to the generated functions: [insert code].

viii. Insert/Update/Delete

Handling these statements is relatively straightforward. The SELECT…INTO statement, a single-row read is also handled as a special case.


i.The awk Code

We generally use GNU awk or gawk, to execute the code. The completed code is less than 1,000 lines in length, including comments. The code relies on regular expressions to locate embedded SQL in the program, as well as manipulation of the SQL statements themselves. As an example, recognizing the ESQL macros is performed in this manner: [insert code]. The getStatement () function reads through the input file until it reaches the terminating semicolon of the SQL statement. This function understands both single and double quotes inside the SQL but will be unable to decipher unbalanced quotes. The getStatement function returns the entire SQL statement as a single string of an Embed PDF in HTML.

We use several routines to convert the code to a plain vanilla SQL by removing Tandem SQL/MP-specific syntax, etc. For ODBC, host variables are replaced with a question mark. Host variables are recognized by regular expressions: [insert code]. Most of the awk code generates the required inline code for database calls. We have additional options to generate the debug code.

ii. Helper Routines

To allow the C++ compiler to match the types, we wrap the API with inline functions. For MySQL, we substitute the parameters directly into the statement itself: [insert code]. For ODBC query parameters: [insert code].

iii. Some More Details

The pre-compiler was able to solve minor SQL differences from one dialect to the next. NonStop SQL by HP requires dates to be handled differently. The pre-compiler uses #line directives to enable programmers to de-bug against the original source code and follow the same indentation pattern as the original code.

iv. To-Do List

1. The current code is not especially generic and requires a separate awk script for each target database.

2. We may re-write the code in Perl to give it more traction in the open-source community.

3. We intend to remove and/or isolate the application-specific code.

4. To achieve a more generic pre-compiler will require more robust pattern matching. We will avoid yacc grammar to prevent negative effects.

4. Conclusion

This code is used to compile a large production system that runs on Linux I4-64 using MySQL. It took us two days to get our system running on MySQL, using ESQL/C.