An open Pre-compiler for Embedded SQL

Updated on: 19.05.2026

Building a custom ESQL pre-compiler is a practical solution when you're porting a large codebase to databases that lack native embedded SQL support. This article walks you through the architecture, implementation details, and real-world lessons from building an awk-based pre-compiler that targets MySQL, ODBC, Oracle, and PostgreSQL — helping you avoid a full rewrite of your SQL modules.

The Problem: 150 Modules, No Pre-compiler

When porting a production system with approximately 150 modules that use embedded SQL to Linux, the team faced a critical obstacle: not all target databases — MySQL being a prime example — ship with a pre-compiler for ESQL. Rewriting every module to vendor-specific APIs was simply not a viable path. The decision to build a custom pre-compiler was both pragmatic and necessary.

The result is an awk script with distinct versions for MySQL, ODBC, Oracle, and PostgreSQL. NonStop SQL/MP databases require additional, database-specific handling and minor code adjustments when porting from other systems.

Understanding Embedded SQL

Embedded SQL (ESQL) allows developers to write SQL statements directly inside a host language like C/C++, COBOL, or PLI. The pre-compiler approach described here targets C/C++ specifically, though the concepts apply broadly. To illustrate each stage, both MySQL and ODBC code paths are shown side by side.

Declaring Host Variables

Any database API requires the programmer to specify the type of each host variable. This means the pre-compiler must parse declarations to determine those types. Some databases only support simple declarations; others allow full C structures and typedefs in the declare section. Rather than writing a full C parser, the pre-compiler generates C++ code from the original C and ESQL source, wrapping the database API with a lightweight layer that uses polymorphism to enforce correct types — letting the C++ compiler do the heavy lifting. For MySQL and ODBC, the awk script takes the declare section and comments out the EXEC SQL macros.

Declaring a SELECT Statement

The pre-compiler creates a static variable holding the SQL statement text, along with additional flags to track whether the statement has been prepared. For MySQL, host variable names are kept in place. For ODBC, host variable names are replaced with a question mark (?) placeholder — the only significant difference between the two generated code paths.

Executing a SELECT Statement

When the program opens the ESQL cursor, the SELECT statement is executed. The first step is always ensuring a valid database connection. Systems that used pre-compiled and bound SQL traditionally handle connections implicitly; here, that implicit behavior must be made explicit. For ODBC, the statement is prepared and host variables are bound before execution begins.

Fetching Data

The FETCH statement specifies the target variables for incoming data. Because polymorphism is used throughout the generated C++ layer, the compiler determines the correct function to call based on variable type — no manual type specification is needed at fetch time. This keeps the fetch logic clean and consistent across database backends.

Closing the Cursor and Handling DML

Closing the cursor is the simplest step: a single generated call translates directly to the MySQL or ODBC close equivalent. INSERT, UPDATE, and DELETE statements are handled in a relatively straightforward manner. The SELECT...INTO pattern — a single-row read — is treated as a special case and handled separately.

Implementation: Inside the awk Script

How the awk Code Works

The pre-compiler runs on GNU awk (gawk) and clocks in at under 1,000 lines including comments — a compact footprint for the functionality it delivers. It relies heavily on regular expressions to locate ESQL constructs within the source file and to manipulate SQL statement text. The getStatement() function reads through the input until it reaches the terminating semicolon of the SQL statement, handling both single and double quotes inside SQL correctly (unbalanced quotes are a known limitation). The function returns the entire SQL statement as a single string for further processing.

Several routines normalize the SQL to plain vanilla syntax — stripping Tandem SQL/MP-specific constructs, converting ODBC host variables to question marks, and recognizing host variables via regular expressions. The bulk of the awk code generates the required inline database call code, with additional options available for debug output. If you're looking to share or publish technical documentation produced during such a development effort, you can Publish your technical documentation online for free — making it accessible to your team without printing or distributing physical copies.

Publishing technical work digitally also means your colleagues can access it directly from a browser, embed it on an internal wiki or project site, or share it via a link in an email — no PDF viewer installation required, and updates are reflected immediately for every reader.

If you want to try this yourself, you can start publishing for free on YUMPU and reach your audience without any distribution overhead.

Helper Routines and Type Wrapping

To allow the C++ compiler to match host variable types at fetch time, the API is wrapped with inline functions. For MySQL, parameters are substituted directly into the statement string. For ODBC query parameters, a separate binding mechanism is used. This wrapper layer is the key abstraction that makes the polymorphism approach work without needing a full C parser. For a deeper look at document format trade-offs relevant to technical publishing workflows, see this overview of PostScript vs. PDF.

Dialect Normalization and Debugging

The pre-compiler smooths over minor SQL dialect differences between database systems. NonStop SQL by HP, for example, requires dates to be formatted differently from standard SQL. To preserve a smooth development experience, the pre-compiler emits #line directives so that debuggers point back to the original source file rather than the generated output. Indentation from the original code is also preserved in the generated files.

What's Next: The To-Do List

The current implementation works well for its intended use case, but several improvements are planned:

The code is not especially generic — a separate awk script is currently required for each target database. Consolidating these is a priority.
A rewrite in Perl is under consideration to improve accessibility and traction in the open-source community.
Application-specific code will be extracted and isolated to make the core pre-compiler reusable.
More robust pattern matching is needed for a truly generic pre-compiler. yacc grammar will be deliberately avoided to prevent complexity overhead and unintended side effects.

Conclusion

This pre-compiler is actively used to build a large production system running on Linux IA-64 with MySQL. From the start of the port to a fully running system on MySQL took just two days, using ESQL/C. For teams facing similar cross-database porting challenges, an awk-based pre-compiler is a low-cost, high-leverage approach that keeps your existing SQL modules intact. For embedding the resulting documentation or output files on a web page, see the guide on Embed PDF in HTML.

FAQ: ESQL Pre-compiler for Cross-Database Compatibility

What is an ESQL pre-compiler and why would I need one?

An ESQL pre-compiler translates embedded SQL statements written inside a host language (like C or C++) into database-specific API calls. You need one when your target database — such as MySQL — does not ship with its own pre-compiler, and rewriting all SQL modules to native APIs is not practical.

Which databases does this awk-based pre-compiler support?

The pre-compiler has separate awk script variants for MySQL, ODBC, Oracle, and PostgreSQL. NonStop SQL/MP databases require additional coding and minor adjustments when porting from other systems.

Why was awk chosen for implementing the pre-compiler?

Awk is well-suited for line-by-line text processing and regular expression matching, which are exactly the operations needed to parse and transform ESQL source files. The completed script is under 1,000 lines including comments, making it compact and maintainable.

How does the pre-compiler handle C++ type matching without a full C parser?

Instead of fully parsing C declarations, the pre-compiler generates C++ code that wraps the database API with inline functions using polymorphism. This offloads type resolution to the C++ compiler, which can determine the correct function to call at compile time.

What is the difference between the MySQL and ODBC code generated by the pre-compiler?

The main difference is in how host variables are handled. For MySQL, host variable names are kept in the SQL statement. For ODBC, they are replaced with a question mark (?) placeholder, and variables are bound separately before execution.

Does the pre-compiler support debugging against the original source file?

Yes. The pre-compiler emits #line directives in the generated code, so debuggers will reference the original ESQL source file rather than the generated output. The original indentation pattern is also preserved.

How are SQL dialect differences between databases handled?

The pre-compiler includes normalization routines that strip database-specific syntax — for example, Tandem SQL/MP constructs — and convert the SQL to a plain standard form. Date handling differences, such as those required by NonStop SQL from HP, are addressed through specific pre-compiler logic.

What are the known limitations of the current pre-compiler?

The implementation is not fully generic: a separate script is needed per target database, and the codebase contains application-specific logic. Unbalanced quotes inside SQL strings can also cause parsing issues in the getStatement() function.

Is a Perl rewrite planned, and what would that achieve?

A Perl rewrite is under consideration. Perl would offer more robust text processing capabilities and greater visibility in the open-source community, potentially making the pre-compiler more accessible as a reusable tool for other projects.

How long does it take to port a large codebase using this pre-compiler?

In the documented case, a production system with approximately 150 ESQL modules was successfully ported to MySQL on Linux IA-64 in just two days using the ESQL/C approach enabled by this pre-compiler.