33

Does anyone know of a simple tool that will open up a CSV file and let you do basic, SQLesque queries on it? Like a graphical tool of sorts, one that is easy to use.

I know I could write a small script to do an import of the CSV into a SQLite database, but since I imagine someone else thought of this before me, I just wanted to inquire if one existed. What's prompting this question is I am getting frustrated with Excel's limited filtering capabilities.

Perhaps some other data visualization manipulation tool would provide similar functionality.

Free or OSS is preferred, but I'm open to any suggestions.

EDIT:

I really would prefer some clear tutorials on how to do the below instead of just "make your sheet an ODBC entry" or "write programs using ODBC files", or more ideas on apps to use. Note: I cannot use MS Access.

Yet another EDIT:

I'm still open for solutions using SQLite. My platform is a semi-ancient Win2k laptop, with a P4 on it. It's quite slow, so a resource-light solution is ideal and would likely get the win.

Dominique
  • 2,373
J. Polfer
  • 2,342

21 Answers21

34

I think OpenOffice.org Database can do what you want. It works like this.

  1. Start Open Office.org Database, it shows the "Database Wizard"

  2. Select "Connect to an existing database: Text"

    enter image description here

  3. Specify path to text files as well as details like separator character etc.

    enter image description here

  4. Create and execute Queries

    enter image description here

If you have ever worked with Microsoft Access you will find the GUI familiar.


If you can do without a GUI there are always the traditional UNIX commands. I use them a lot to do simple queries to (small) CSV files. Here is how it works:

clause      operation   command
-------------------------------
from             join     `join`
where     restriction     `grep`
order by           --     `sort`
group by  restriction      `awk`
having    restriction     `grep`
select     projection      `cut`
distinct  restriction     `uniq`
limit     restriction     `head`
offset    restriction     `tail`
Gareth
  • 19,080
15

Have you tried LogParser?

Log parser is a powerful, versatile tool that provides universal query access to text-based data such as log files, XML files and CSV files, as well as key data sources on the Windows® operating system such as the Event Log, the Registry, the file system, and Active Directory®. You tell Log Parser what information you need and how you want it processed. The results of your query can be custom-formatted in text based output, or they can be persisted to more specialty targets like SQL, SYSLOG, or a chart.

Most software is designed to accomplish a limited number of specific tasks. Log Parser is different... the number of ways it can be used is limited only by the needs and imagination of the user. The world is your database with Log Parser.

A tutorial (and a another one) on using the SQL like query language with CSV files I found using google.

Example Query:

logparser -i:CSV "SELECT TOP 10 Time, Count INTO c:\logparser\test\Chart.GIF 
FROM c:\logparser\test\log.csv ORDER by Time DESC" -charttype:bar3d
svandragt
  • 3,353
14

You can use ODBC to query text files:

Accessing Text Files using ODBC Data Provider

Note that you don't need MS Access for this to work, the tutorial in the above link just uses MS Access to create the text file, but as you already have a text file, scroll down halfway, and start the tutorial where you see the title Accessing a Text File.

Update: I created a DSN on a .csv file myself to be able to create this step by step tutorial... here it comes:

  • Make sure your .csv file is in its own directory without anything else.
  • Open the "ODBC Data Source Administrator" (start - control panel - administrative tools - Data Sources (ODBC)).
  • Go to the File DSN tab and click on "Add...".
  • Choose "Microsoft Text Driver (*.txt, *.csv) from the list and click "Next >".
  • Give a name for your file data source (e.g. "test") and click "Next >".
  • Click "Finish" (After this, a dialog will appear where the "Data source name" and "Description" fields are indeed greyed out. This is normal. No worries.
  • Uncheck the "Use Current Directory" checkbox. The "Select Directory" button will be enabled.
  • Click the "Select Directory" button and nagivate to the folder in which you placed your .csv file in the first step.
  • Click on the "Options>>" button.
  • Click on the "Define Format..." button.
  • In the left "Tables" list, select your .csv file and click on the "Guess" button. (This will analyse your csv file and create an appropriate field for each column in your .csv file.)
  • Go through the generated columns (F1,F2,...) in the right list, give them meaningful names and set the appropriate data type (sometimes the guessing is not always right).
  • Once everything is set up right, click "OK" (2 times).

At this point you should have a file DSN with which you can access your .csv file through ODBC. If you inspect your folder where the .csv file is placed, you'll see a schema.ini file, which contains the config you just created. When you have multiple .csv files, each one corresponds with a table and each table will have a [filename.csv] block in the schema.ini file in which the different columns are defined... You can also create/change that schema.ini file directly in a text editor in stead of using the GUI described above.

As for your extra question "how to connect to this ODBC provider using a query tool":
I have a tool which I wrote myself long time ago which is not eligible for publication. But a quick Google search came up with odbc-view, a freeware tool which does what you want.
I downloaded and installed the tool.
After starting the tool:

  • Click on "DataSource...".
  • Select your File Data Source which you created previously (e.g. "test").
  • In the query pane type "select * from [filename.csv]".
  • Click "Execute".

You should see the contents of your .csv file in the lower pane now.
Hope this helps... Let me know how you do or if you need further assistance.

fretje
  • 10,732
11

I found that the simplest way to achieve this is to just use the built-in CSV import functionality of SQLite:

  1. sqlite3 mydatabase.sqlite
  2. sqlite> .mode csv
  3. sqlite> .import mydata.csv <tablename>

Now you have a working database that you can query however you like. I also found the performance of the above to be good, I just imported three million rows in 10-15 seconds.

8

I like using R to access csv files in a fast way. While the language isn't directly SQL you can do all those things with simple commands in R. R also provides you with the ability to make nice graphs and a lot of other power.

Christian
  • 2,334
5

You can always read the file into Excel and use Excel as your data source via ODBC and run queries against it.

m.floryan
  • 153
5

I found a small, non-graphical app that does this: csvsql.

Documentation is here.

4

You can take a look at the free tool q - Text as a Database , which allows executing SQL directly on csv files, including joins, grouping and any other SQL construct. Also includes automatic detection of column names and column types.

It's a command line tool which matches the Linux way of operation (e.g. Piping from stdin if needed, special flags for customizing behavior, etc.).

Uses sqlite behind the scenes, so very light weight and easy to use.

Full disclosure - It's my own open source tool. Hope you find it useful

Harel Ben-Attia

3

The H2 JDBC driver provides a very useful csvread function, allowing you to do things like the following :

select * from csvread(test1.csv) test1 
inner join csvread(test2.csv2) test2
on test1.id = test2.foreignkey

There are various ways to make use of this driver without having to delve into writing code to use it.

Personally, I prefer the Squirrel SQL Client which gives you a nice GUI for running queries in. To use it all you need to do is point the already listed H2 In-Memory Driver classpath to the H2 driver you have downloaded. Once you've setup a suitable alias using the driver you can run whatever random SQL you want. Results are displayed in a nice table and all sorts of other features for importing, exporting, etc.

Alternatively, you can use Groovy to write a quick script to load and use the driver as necessary. See this example blogpost to find out how.

It appears someone has extended the above groovy script and made it into a nice command line tool to run the queries, see gcsvsql. With this you can run commands like the following :

gcsvsql "select * from people.csv where age > 40"
lstg
  • 101
  • 1
3

You may want to try Q tool. It is very lightweight, only requires Python 2.5 or newer.

agrrd
  • 31
  • 1
2

There is sqliteviz which I'm the author of. It is a free open-source offline-first PWA that allows you to convert a CSV file into SQLite database which runs in a browser with sql.js. After that, you can filter your data with SELECT statements and create various Plotly charts against the result set.

Convert your CSV into a database

Choose your CSV file. Sqliteviz will show you the Import CSV dialogue with data preview. You can change parsing settings like delimiter character and others. sqliteviz CSV import dialogue

Filter your data with SQL

Now all records from your CSV are available as rows in csv_import table. Write your query and run it. sqliteviz query result set

Create a chart

After running a query you can switch the result panel to the chart mode and create a chart with a react-chart-editor component. The same component with some additional features is used in Plotly Chart Studio. Explore its documentation to learn how to build charts with react-chart-editor. You can create various charts from simple line/bar/pie chart to financial candlesticks and scientific visualisation. sqliteviz chart example

In sqliteviz you can also save often-used queries and import/export them. See sqliteviz documentation.

Lana
  • 151
  • 2
2

A tool that I've found that, I think, may make this easier in the future is Resolver One.

It's a spreadsheet that generates Python code that is easily modifyable. For those that are developers, and occasionally need to "step down" to solve problems in spreadsheets, this seems like an intuitive way to solve spreadsheet-esque problems in a language they are familiar with.

And it gives me an excuse to use Python. Python makes me happy.

J. Polfer
  • 2,342
2

I have written a command line program to execute arbitrary SQL on a csv files, including multi-file joins, called gcsvsql. You can read about it here:

http://bayesianconspiracy.blogspot.com/2010/03/gcsvsql.html

There is a Google Code project for it here: http://code.google.com/p/gcsvsql/

It's written in Java/Groovy, and will run anywhere Java is available.

Edit: Active project moved to github. https://github.com/jdurbin/durbinlib

1

TextQ can do that(disclaimer - I'm its developer). It can import a big CSV file and allows you to manage its schema/structure:

  • parse date and numbers in various formats;
  • rename or hide columns;
  • index columns for faster queries;

You can query via a UI Query Builder or SQL queries (supports full SQLite syntax. Any query result can be exported to a CSV file.

Here is a video demo and more detailed tutorial.

You can get it from the Mac App Store or Microsoft Store (soon).

1

While not free, the best program I have found for this is File Query. Unlike the other solutions which are either command-line based, or require importing/setting up the file before accessing it, File Query lets you open a file (even GBs inup like a normal text editor, and will automatically parse the layout for you, and let you do almost all of your querying from simple dialogues.

It's a little pricey, but if you only need to do something once, you can always just use the 30-day trial for free. They also have great guides and even videos to get you started.

1

Two more options for this task: querycsv and fsql. Querycsv is Python and uses sqlite3. Fsql is Perl and uses DBD::CSV.

Nelson
  • 2,649
0

There is a Notepad++ plugin CsvQuery to run sql queries on csv files opened in npp. https://github.com/jokedst/CsvQuery

Greck
  • 301
0

You can try RBQL: https://github.com/mechatroner/RBQL It allows to execute SQL-like queries against CSV/TSV files with Python or JavaScript expressions depending on the backend you choose. You can use RBQL as a standalone app, but it is also bundled up with "Rainbow CSV" plugins which are available for VSCode, Vim, Sublime Text and Atom text editors. It also work in web browsers, so you can test RBQL without installing it, just go to https://rbql.org , upload your CSV file and run a query.

0

A software tool that allows performing SQL on CSV files? Use csvsql or q, command-line tools that allow direct execution of SQL-like queries on CSVs/TSVs (and any other tabular text files) if your query is very simple. If you want to do more complex queries or even want to query excel files directly, try esProc. Refer to https://esprocforbp.medium.com/directly-query-excel-text-files-using-sql-5315788231e4. Not only supports ordinary SQL, but also supports Group HAVING, Subquery, Nested Subquery, Join, even ‘with…as’ Table Expression, for example:

$with A as
(select NAME as DEPT from E:/department.xlsx where NAME='HR' or NAME='Sales') 
  select A.DEPT DEPT,count(*) NUM,avg(B.SAL_ARY) AVG_SALARY from A 
   left join E:/employee.xls B on A.DEPT=B.DEPT
   where B.GENDER='F' group by A.DEPT
Jerry
  • 1
0

Flatbase is a free web app that allows you to drag and drop CSV files from your desktop and query them using SQL. You can also graph the results, export as CSV, and share with your colleagues.

Full disclosure: I created Flatbase.

0

you can use WHS. For example, I have 4 files in directory 'C:\Users\user837\Desktop\t4': 1.txt

id;sex_ref;sale
1;1;10
2;2;30
3;1;20

2.txt

sex_id;name
1;male
2;female

schema.ini

[1.txt]
Format=Delimited(;)
ColNameHeader=True
MaxScanRows=50
DecimalSymbol=,
[2.txt]
Format=Delimited(;)
ColNameHeader=True
MaxScanRows=50
DecimalSymbol=,

and Hello.js

WScript.Echo("Hello World!");
var cn = new ActiveXObject("ADODB.Connection");
cn.Open("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"C:\\Users\\user837\\Desktop\\t4\";Extended Properties=\"text;HDR=NO;FMT=Delimited\"");

var rs = cn.Execute("select * from 1.txt as t1 left join 2.txt as t2 on t1.sex_ref = t2.sex_id");

while (!rs.EOF) 
{
    WScript.Echo(           rs.Fields("id").Value
                  + "###" + rs.Fields("sex_ref").Value
                  + "###" + rs.Fields("name").Value
                );
    rs.moveNext();
}

Now just double click on Hello.js and you'll see sql reqult line by line. See WHS documentation to view all query results.

Oleg
  • 1