Towards AI•about 23 hours ago
Building a Simple SQL Query Generator Using LLMs
Last Updated on February 21, 2026 by Editorial Team Author(s): Sainath Udata Originally published on Towards AI. Writing SQL is an art, but not everyone enjoys the canvas. For many, the syntax of a complex JOIN or a nested subquery is a barrier to getting the data they need. In our fast-paced world, not everyone has the time to become a database administrator—and with the power of Large Language Models (LLMs), you don’t have to. Instead of teaching you how to write queries, I’m going to show you how to build a tool that writes them for you. We will use a Python-based extractor to feed your database schema to an LLM, turning a simple English question into a functional SQL statement. The Tech Stack We’ll use MS SQL Server for this walkthrough, but the logic applies to any relational database (Postgres, MySQL, etc.) as long as you use the correct connector library. dotenv: To securely load your environment variables and API keys. mssql-python: Our bridge to the MS SQL Server. openai: To connect to your LLM of choice (e.g., GPT-4). sqlparse: To clean up and format the SQL generated by the LLM. uv: A blazing-fast Python package manager to handle our environment. Setting Up Your Environment To get started, initialize your project using uv or your preferred manager. # pyproject.toml[project]name = "simple_sql_query_generator"version = "0.1.0"requires-python = ">=3.11" dependencies = [ "dotenv>=0.9.9", "mssql-python>=1.3.0", "openai>=2.21.0",] Create pyproject.toml file with the same contents as above and run the command uv sync to install these dependencies and create your virtual environment. Also create .env file to have all your keys or environment details in it, # .envDB_HOST=<DB HOST>DB_PORT=<DB PORT>DB_NAME=<DB NAME>DB_USER=<DB USER>DB_PASSWORD=<DB PASSWORD>OLLAMA_API_KEY=ollamaOLLAMA_BASE_URL="http://localhost:11434/v1"OPENAI_API_KEY=ANTHROPIC_API_KEY=GROK_API_KEY=GROQ_API_KEY= Preparing the Playground An LLM can’t guess what’s in your database. It needs context: table names, columns, and relationships. To demonstrate, let’s create a classic Customers and Orders scenario. -- sql_script.sql-- 1. Create Customers TableCREATE TABLE customers ( customer_id INT NOT NULL PRIMARY KEY, name NVARCHAR(255) NULL, email NVARCHAR(255) NULL, created_at DATETIME2 NULL DEFAULT GETDATE());-- 2. Create Orders TableCREATE TABLE orders ( order_id INT NOT NULL PRIMARY KEY, customer_id INT NULL, order_date DATE NULL, total_amount DECIMAL(18, 2) NULL, -- Define Foreign Key relationship CONSTRAINT FK_Orders_Customers FOREIGN KEY (customer_id) REFERENCES customers(customer_id));-- 3. Insert Sample RecordsINSERT INTO customers (customer_id, name, email, created_at)VALUES (1, 'Alice Johnson', 'alice@example.com', '2025-01-15 10:30:00'),(2, 'Bob Smith', 'bob@example.com', '2025-02-20 14:45:00'),(3, 'Charlie Brown', 'charlie@example.com', '2025-03-05 09:15:00'),(4, 'Diana Prince', 'diana@themyscira.com', '2025-07-12 08:00:00'),(5, 'Edward Nigma', 'riddler@gotham.com', '2025-08-01 11:20:00'),(6, 'Fiona Gallagher', 'fiona@southside.com', '2025-09-14 16:30:00'),(7, 'George Costanza', 'art_vandelay@latex.com', '2025-10-10 12:00:00'),(8, 'Hannah Abbott', 'hannah@hufflepuff.edu', '2025-11-22 09:45:00'),(9, 'Ian Malcolm', 'chaos@jurassic.org', '2025-12-05 13:10:00'),(10, 'Julia Child', 'cooking@french.com', '2026-01-10 15:55:00');INSERT INTO orders (order_id, customer_id, order_date, total_amount)VALUES (101, 1, '2025-04-01', 150.50),(102, 1, '2025-04-15', 89.99),(103, 2, '2025-05-20', 210.00),(104, 3, '2025-06-10', 45.00),(105, 4, '2025-07-20', 500.00),(106, 5, '2025-08-05', 12.50),(107, 5, '2025-08-15', 35.75),(108, 6, '2025-09-20', 99.99),(109, 1, '2025-10-01', 25.00), -- Customer 1 again(110, 2, '2025-11-05', 300.25), -- Customer 2 again(111, 7, '2025-12-01', 15.00),(112, 4, '2025-12-15', 750.00), -- High value order(113, 8, '2026-01-05', 62.40),(114, 10, '2026-01-20', 120.00),(115, 1, '2026-02-01', 10.00), -- Recent order(116, 5, '2026-02-10', 55.00); The “Secret Sauce”: Automating Schema Extraction For an LLM to generate valid SQL, it needs more than just table names; it needs to understand the “connective tissue” of your database — primary keys, foreign keys, and data types. While this isn’t a SQL class, we use a few standard T-SQL queries to “teach” the LLM our database structure. By querying the INFORMATION_SCHEMA, we can build a Markdown map that looks like this: Tables: We fetch all BASE TABLE entries to avoid system views. Columns: We grab types and nullability to help the LLM understand what data it’s dealing with. Keys: We identify Primary and Foreign keys, so the LLM knows exactly how to JOIN tables without guessing. ## Table: customersColumns:- customer_id: int NOT NULL (PRIMARY KEY)- name: nvarchar NULL- email: nvarchar NULL- created_at: datetime2 NULL## Table: ordersColumns:- order_id: int NOT NULL (PRIMARY KEY)- customer_id: int NULL- order_date: date NULL- total_amount: decimal NULLForeign Keys:- customer_id → customers.customer_id We wrap the queries into a Python class that does the heavy lifting. This class connects to your DB, runs the metadata queries, and outputs a clean Markdown string. # schema_extractor.pyfrom mssql_python import connectfrom typing import Dict, List, Optionalclass SchemaExtractor: def __init__(self, connection_string): # Using the mssql-python connect method if isinstance(connection_string, dict): # Unpacks dict keys into: connect(SERVER='...', DATABASE='...') self.conn = connect(**connection_string) else: # Treats it as a standard connection string self.conn = connect(connection_string) self.cursor = self.conn.cursor() def get_tables(self, include_tables: Optional[List[str]] = None) -> List[str]: """Get specific or all table names in the database.""" if include_tables: # Format list for T-SQL IN clause: 'table1', 'table2' placeholders = ", ".join([f"'{t}'" for t in include_tables]) filter_clause = f"AND table_name IN ({placeholders})" else: filter_clause = "" query = f""" SELECT table_name FROM information_schema.tables WHERE table_type = 'BASE TABLE' AND table_schema = 'dbo' {filter_clause} ORDER BY table_name; """ self.cursor.execute(query) return [row[0] for row in self.cursor.fetchall()] def get_table_schema(self, table_name: str) -> Dict: """Get detailed schema for a specific MSSQL table.""" # 1. Get Columns query = """ SELECT column_name, data_type, is_nullable, column_default FROM information_schema.columns WHERE table_name = ? ORDER BY ordinal_position; """ self.cursor.execute(query, (table_name,)) columns = [] for row in self.cursor.fetchall(): columns.append({ 'name': row[0], # Accessing by index 'type': row[1], 'nullable': row[2] == 'YES', 'default': row[3] }) # 2. Get Primary Keys (Improved Query) pk_query = """ SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE WHERE OBJECTPROPERTY(OBJECT_ID(CONSTRAINT_SCHEMA + '.' + CONSTRAINT_NAME), 'IsPrimaryKey') = 1 AND TABLE_NAME = ?; """ self.cursor.execute(pk_query, (table_name,)) # fetchall() returns a list of tuples like [('id',), ('other_pk',)] primary_keys = [row[0] for row in self.cursor.fetchall()] return { 'table_name': table_name, 'columns': columns, 'primary_keys': primary_keys } def get_foreign_keys(self, table_name: str) -> List[Dict]: """Get foreign key relationships in MSSQL.""" query = """ SELECT cp.name AS column_name, tr.name AS referenced_table, cr.name AS referenced_column FROM sys.foreign_keys AS fk INNER JOIN sys.foreign_key_columns AS fkc ON fk.object_id = fkc.constraint_object_id INNER JOIN sys.tables AS tp ON fkc.parent_object_id = tp.object_id INNER JOIN sys.columns AS cp ON fkc.parent_object_id = cp.object_id AND fkc.parent_column_id = cp.column_id INNER JOIN sys.tables AS tr ON fkc.referenced_object_id = tr.object_id INNER JOIN sys.columns AS cr ON fkc.referenced_object_id […]