如何使用GPT作为SQL查询引擎的自然语言( 二 )

步骤3:将数据发送到OpenAI的API用户可以先将数据剪切并粘贴到OpenAI的Web界面中 , 然后在ChatGPT或OpenAI API中查看结果 。ChatGPT不收取使用费用,但用户不能调整其结果 。可以让用户设置温度之类的参数,这意味着其反应应该有多“随机”或多有创意,以及服务商想使用哪种模型 。对于SQL代码 , 将温度设置为0 。
接下来,将一个自然语言问题保存到变量my_query中 , 使用create_prompt()函数创建一个提示符,然后观察当将该提示符粘贴到API playground中时会发生什么:
> my_query <- "What were the highest and lowest Population changes in 2020 by Division?"> my_prompt <- get_query(states_schema_string, states_sample_string, my_query, "states")> cat(my_prompt)Act as if you're a data scientist. You have a SQLite table named states with the following schema: ```0 State TEXT 0 NA 01 Pop_2000 INTEGER 0 NA 02 Pop_2010 INTEGER 0 NA 03 Pop_2020 INTEGER 0 NA 04 PctChange_2000 REAL 0 NA 05 PctChange_2010 REAL 0 NA 06 PctChange_2020 REAL 0 NA 07 State Code TEXT 0 NA 08 Region TEXT 0 NA 09 Division TEXT 0 NA 0```The first rows look like this: ```Delaware783600897934989948 17.6 14.6 10.2 DE South South AtlanticMontana902195989415 1084225 12.99.79.6 MT West MountainArizona 5130632 6392017 7151502 40.0 24.6 11.9 AZ West Mountain```Based on this data, write a SQL query to answer the following question: What were the highest and lowest Population changes in 2020 by Division?. Return the SQL query ONLY. Do not include any additional explanation.

如何使用GPT作为SQL查询引擎的自然语言

文章插图
提示输入OpenAI API playground和生成的SQL代码
以下是运行建议的SQL时的结果:
sqldf("SELECT Division, MAX(PctChange_2020) AS Highest_PctChange_2020, MIN(PctChange_2020) AS Lowest_PctChange_2020 FROM states GROUP BY Division;") Division Highest_PctChange_2020 Lowest_PctChange_20201 East North Central 4.7 -0.12 East South Central 8.9 -0.23 Middle Atlantic 5.7 2.44 Mountain 18.4 2.35 New England 7.4 0.96 Pacific 14.6 3.37 South Atlantic 14.6 -3.28 West North Central 15.8 2.89 West South Central 15.9 2.7ChatGPT不仅生成了准确的SQL , 而且也不必告诉GPT“2020人口变化”在Pop_2020列中 。
步骤4:执行GPT返回的SQL代码的结果以编程方式向OpenAI发送和返回数据,而不是将其剪切和粘贴到Web界面中,这将会方便得多 。有几个R包可以使用OpenAI API 。下面的代码块使用OpenAI包向API发送一个提示,存储API响应,提取响应中包含带有请求的SQL代码的文本的部分,打印该代码,并在数据上运行SQL 。
library(openai)my_results <- openai::create_chat_completion(model = "gpt-3.5-turbo", temperature = 0, messages = list( list(role = "user", content = my_prompt))) the_answer <- my_results$choices$message.contentcat(the_answer)SELECT Division, MAX(PctChange_2020) AS Highest_Population_Change, MIN(PctChange_2020) AS Lowest_Population_ChangeFROM statesGROUP BY Division;sqldf(the_answer) Division Highest_Population_Change Lowest_Population_Change1 East North Central 4.7 -0.12 East South Central 8.9 -0.23 Middle Atlantic 5.7 2.44 Mountain 18.4 2.35 New England 7.4 0.96 Pacific 14.6 3.37 South Atlantic 14.6 -3.28 West North Central 15.8 2.89 West South Central 15.9 如果用户想使用OpenAI API,需要一个OpenAI API密钥 。对于这个包 , 密钥应该存储在一个系统环境变量中,例如OPENAI_API_KEY 。需要注意的是,这个API不是免费使用的,但在把它变成编辑器之前,一天运行了这个项目十几次,而其总账户使用的费用是1美分 。
步骤5(可选):创建交互式应用程序现在,已经在脚本或终端中拥有了在R工作流中运行查询所需的所有代码 。但是,如果想用简单的语言制作一个交互式应用程序来查询数据 , 这里已经包含了一个基本的Shiny应用程序的代码,可以使用它 。
如果打算发布一个应用程序供其他人使用 , 那么将需要加强代码安全性以防止恶意查询,添加更优雅的错误处理和解释性标签 , 改进样式,或者对其进行扩展以供企业使用 。
与同时,这段代码应该开始创建一个交互式应用程序,用自然语言查询数据集:
library(shiny)library(openai)library(dplyr)library(sqldf)# Load hard-coded datasetstates <- read.csv("states.csv") |> dplyr::filter(!is.na(Region) & Region != "")states_schema <- sqldf::sqldf("PRAGMA table_info(states)")states_schema_string <- paste(apply(states_schema, 1, paste, collapse = "t"), collapse = "n")states_sample <- dplyr::sample_n(states, 3)states_sample_string <- paste(apply(states_sample, 1, paste, collapse = "t"), collapse = "n")# Function to process user inputget_prompt <- function(query, schema = states_schema_string, rows_sample = states_sample_string, table_name = "states") { my_prompt <- glue::glue("Act as if you're a data scientist. You have a SQLite table named {table_name} with the following schema: ``` {schema} ``` The first rows look like this:```{rows_sample}``` Based on this data, write a SQL query to answer the following question: {query} Return the SQL query ONLY. Do not include any additional explanation.") print(my_prompt) return(my_prompt)}ui <- fluidPage( titlePanel("Query state database"), sidebarLayout( sidebarPanel( textInput("query", "Enter your query", placeholder = "e.g., What is the total 2020 population by Region?"), actionButton("submit_btn", "Submit") ), mainPanel( uiOutput("the_sql"), br(), br(), verbatimTextOutput("results") ) ))server <- function(input, output) {# Create the prompt from the user query to send to GPT the_prompt <- eventReactive(input$submit_btn, { req(input$query, states_schema_string, states_sample_string) my_prompt <- get_prompt(query = input$query) }) # send prompt to GPT, get SQL, run SQL, print resultsobserveEvent(input$submit_btn, { req(the_prompt()) # text to send to GPT # Send results to GPT and get response # withProgress adds a Shiny progress bar. Commas now needed after each statement withProgress(message = 'Getting results from GPT', value = https://www.isolves.com/it/ai/2023-10-27/0, { # Add Shiny progress message my_results


推荐阅读