Analyzing Inventory Tick Information in SingleStoreDB Utilizing LangChain and OpenAI’s Whisper

Disclaimer

The inventory information used on this article is solely fictitious. It’s purely for demo functions. Please don’t use this information for making any monetary choices.

This text will present tips on how to join a Kafka dealer, streaming instance inventory tick information, to SingleStoreDB. We’ll then question the info utilizing English sentences by LangChain, which gives a fundamental question-and-answer functionality for the tick information. We’ll construct a Python software, by a number of design iterations, to make use of OpenAI’s Whisper to ask questions by speech and use speech synthesis to answer.

The pocket book file, SQL, and Python code can be found on GitHub.

Introduction

The power to ask questions on a database system utilizing pure language is acquainted. Nonetheless, it has change into a lot simpler to implement with fashionable instruments like LangChain and OpenAI’s Whisper. On this article, we’ll see how.

Create a SingleStoreDB Cloud Account

A previous article confirmed the steps to create a free SingleStoreDB Cloud account. We’ll use the next settings:

  • Workspace Group Title: Whisper Demo Group
  • Cloud Supplier: AWS
  • Area: US East 1 (N. Virginia)
  • Workspace Title: whisper-demo
  • Measurement: S-00
  • Settings: None chosen

As soon as the workspace is obtainable, we’ll make an observation of our password and host. The host shall be obtainable from Whisper Demo Group > Overview > Workspaces > whisper-demo > Join > Join Instantly > SQL IDE > Host. We’ll want this info later for a Python software. We’ll quickly enable entry from anyplace by configuring the firewall beneath Whisper Demo Group > Firewall.

Create a Database and Tables

From the left navigation pane, we’ll choose DEVELOP > SQL Editor to create a timeseries_db database, tick and stock_sentiment tables, as follows:

CREATE DATABASE IF NOT EXISTS timeseries_db;

USE timeseries_db;

CREATE TABLE IF NOT EXISTS tick (
  image VARCHAR(10),
  ts DATETIME SERIES TIMESTAMP,
  open NUMERIC(18, 2),
  excessive NUMERIC(18, 2),
  low NUMERIC(18, 2),
  worth NUMERIC(18, 2),
  quantity INT,
  KEY(ts)
);

CREATE TABLE IF NOT EXISTS stock_sentiment (
  headline VARCHAR(250),
  constructive FLOAT,
  damaging FLOAT,
  impartial FLOAT,
  url TEXT,
  writer VARCHAR(30),
  ts DATETIME,
  image VARCHAR(10)
);

Create a Pipeline

Pipelines enable us to create streaming ingest feeds from numerous sources, similar to Kafka, S3 and HDFS, utilizing a single command. With pipelines, we are able to additionally carry out ETL operations. For our use case, we’ll create a easy pipeline in SingleStoreDB as follows:

CREATE PIPELINE tick
AS LOAD DATA KAFKA 'public-kafka.memcompute.com:9092/stockticker'
BATCH_INTERVAL 2500
INTO TABLE tick
FIELDS TERMINATED BY ','
(image,ts,open,excessive,low,worth,quantity);

We’ll management the speed of information ingestion utilizing the BATCH_INTERVAL. Initially, we’ll set this to 2500 milliseconds.

We’ll configure the pipeline to begin from the earliest offset, as follows:

ALTER PIPELINE tick SET OFFSETS EARLIEST;

and we’ll check the pipeline earlier than we begin operating it, as follows:

TEST PIPELINE tick LIMIT 1;

The output needs to be much like the next:

+--------+---------------------+-------+-------+-------+-------+--------+
| image | ts                  | open  | excessive  | low   | worth | quantity |
+--------+---------------------+-------+-------+-------+-------+--------+
| AIV    | 2023-09-05 06:47:53 | 44.89 | 44.89 | 44.88 | 44.89 | 719    |
+--------+---------------------+-------+-------+-------+-------+--------+

Bear in mind, that is fictitious information.

We’ll now begin the pipeline:

Load Inventory Sentiment Information

We’ll now load the info into the stock_sentiment desk. The information are derived from a GitHub Gist and are a small subset of the info from Kaggle. The CSV file already contains the sentiment values. We’ll obtain this file after which use a MySQL consumer to hook up with SingleStoreDB Cloud, as follows:

mysql --local-infile -u admin -h <host> -P 3306 --default-auth=mysql_native_password -p

The <host> being changed with the worth we obtained from SingleStoreDB Cloud earlier. We’ll load the info into SingleStoreDB from the CSV file, as follows:

USE timeseries_db;

LOAD DATA LOCAL INFILE '/path/to/stock_sentiment.csv'
INTO TABLE stock_sentiment
IGNORE 1 LINES
COLUMNS TERMINATED BY 't';

The /path/to/ changed with the precise path to the place the CSV file is situated.

Load the Pocket book

We’ll use the pocket book file obtainable on GitHub. From the left navigation pane in SingleStoreDB Cloud, we’ll choose Notebooks. Within the high proper of the net web page shall be New Pocket book with a pulldown that has two choices:

  1. New Pocket book
  2. Import From File

We’ll choose the second choice, find the pocket book file we downloaded from GitHub and cargo it into SingleStoreDB Cloud.

Run the Pocket book

We’ll begin by choosing the Connection (whisper-demo) and Database (timeseries_db) utilizing the drop-down menus above the pocket book.

Analyzing Time Collection Information

Half 1 of the pocket book incorporates a set of Time Collection operations on the info within the tick desk. These operations had been described in higher element in a earlier article.

LangChain OnlinePDFLoader

Half 2 of the pocket book hundreds the contents of a PDF doc and vector embeddings right into a desk known as fintech_docs. These operations had been described in higher element in a previous article. Substitute <PDF doc URL> with the hyperlink of your chosen FinTech doc:

from langchain.document_loaders import OnlinePDFLoader

loader = OnlinePDFLoader("<PDF doc URL>")

information = loader.load()

The Totally Certified Area Title (FQDN) the place the PDF file is situated have to be added to the firewall by choosing the Edit Firewall choice within the high proper.

We’ll use ChatGPT to reply questions on the PDF file, similar to:

"What are the perfect funding alternatives in Blockchain?"

LangChain SQL Agent

Half 3 of the pocket book incorporates LangChain agent operations on the info within the tick and stock_sentiment tables. This would be the major focus of this text.

First, we’ll set up some libraries:

!pip set up langchain --quiet
!pip set up openai --quiet

Now, we’ll enter our OpenAI API Key:

import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

Subsequent, we’ll want some imports:

from langchain import OpenAI, SQLDatabase
from langchain.brokers.agent_toolkits import SQLDatabaseToolkit
from langchain.brokers import create_sql_agent

We’ll configure the LangChain toolkit and agent, as follows:

db = SQLDatabase.from_uri(connection_url)

llm = OpenAI(temperature = 0, verbose = False)

toolkit = SQLDatabaseToolkit(db = db, llm = llm)

agent_executor = create_sql_agent(
  llm = OpenAI(temperature = 0), toolkit = toolkit, verbose = False
)

We’ll now check the code utilizing an instance question, as follows:

agent_executor.run(
"""
    From the tick desk, which inventory image noticed the least volatility in share buying and selling within the dataset?
"""
)

Right here is a few instance output:

'The inventory image with the least volatility in share buying and selling within the dataset is FTR.'

We’ll make this slightly extra interactive utilizing the next code:

question = enter("Please enter your query:")
agent_executor.run(question)

We’ll check this with an instance question:

Utilizing the image A, what's the most constructive sentiment within the stock_sentiment desk and the present finest worth for this image from the tick desk?

Right here is a few instance output:

'Essentially the most constructive sentiment for image A is 0.9576 and the present finest worth is 46.43.'

To see the chain output, we’ll set verbose to True, as follows:

agent_executor = create_sql_agent(
  llm = OpenAI(temperature = 0), toolkit = toolkit, verbose = True
)

An instance of chain output from one other database was proven in a previous article.

Bonus: Construct a Visible Python Software

We’ll now construct a easy Python software that makes use of OpenAI’s Whisper to ask questions concerning the database system. An excellent article impressed this software.

Set up the Required Software program

For this text, that is the software program that was required in a clear set up of Ubuntu 22.04.2 operating in a VMware Fusion Digital Machine:

sudo apt set up ffmpeg
sudo apt set up libespeak1
sudo apt set up portaudio19-dev
sudo apt set up python3-tk
sudo apt set up python3-pil python3-pil.imagetk

in addition to the next packages:

langchain
matplotlib
openai
openai-whisper
pyaudio
pymysql
pyttsx3 wave

These may be discovered within the necessities.txt file on GitHub. Run the file as follows:

pip set up -r necessities.txt

openai-whisper might take some time to put in.

We’ll want to offer an OpenAI API Key in our surroundings. For instance:

export OPENAI_API_KEY="<OpenAI API Key>" 

Substitute <OpenAI API Key> along with your key.

In every of the next purposes, we’ve the next code:

s2_password = "<password>"
s2_host = "<host>"

We’ll change <password> and <host> with the values that we saved earlier from SingleStoreDB Cloud.

First Iteration

Let’s begin with a easy visible Python software utilizing Tkinter. We’ll additionally add voice recognition utilizing OpenAI’s Whisper. The applying permits as much as 20 seconds of recorded speech. It may be run as follows:

python3 record-transcribe.py

Instance output is proven in Determine 1.

Determine 1. First Iteration.

Second Iteration

Within the subsequent iteration, we’ll add an Audio Waveform. We’ll run this system as follows:

python3 record-transcribe-visualise.py

Instance output is proven in Determine 2.

Determine 2. Second Iteration.

Third Iteration

Within the third and closing iteration, we’ll take away a text-based response by the appliance and change it with speech synthesis. We’ll run this system as follows:

python3 record-transcribe-visualise-speak.py

Instance output is proven in Determine 3.

Determine 3. Third Iteration.

Within the code for all three iterations, we are able to name the OpenAI Whisper API as a substitute of utilizing the native Whisper set up. To do that, we would uncomment these strains of code within the transcribe_audio operate:

# transcript = openai.Audio.transcribe(
#     mannequin = "whisper-1",
#     file = audio_file,
#     language = "en"
# )

And remark out this line of code:

transcript = mannequin.transcribe(filename)

Nonetheless, calling the OpenAI Whisper API would incur extra prices. The native Whisper set up already gives glorious outcomes.

Abstract

On this article, we have seen that with out creating vector embeddings, we have been capable of entry our information fairly successfully utilizing LangChain. Nonetheless, our queries should be extra centered, and we have to perceive the database schema earlier than asking questions. Integrating a speech functionality permits our purposes to be extra extensively used.