Questions tagged [parsing]

108 questions
34
votes
10 answers

How can I parse extremely large (70+ GB) .txt files?

I have several .txt files with >30 million lines each, and anywhere from 20 to 40 "columns" (some comma-separated, some space-separated, all ASCII with lines separated by a new-line). I don't need all (or even most) of the columns, and some of them…
hadrian4909
  • 457
  • 1
  • 4
  • 4
20
votes
8 answers

Automatic parsing of citation text in academic references

Is there any software (or pseudo-code) which can automatically scan a piece of text (either pasted into the tool, or read from a .doc/.pdf) and identify citation data using standard formats? The data would then be split up into its constituent…
16
votes
8 answers

Copy/Pasting data from SQL Server to Excel splits up text into multiple columns?

I've got a problem pasting data from the result grid of SQL Server 2005 to an Excel 2007 spreadsheet. I have a query in SQL Server that returns 2 columns (a number column and a text column). On one computer I can happily copy (right-click > copy)…
Paul
  • 813
8
votes
5 answers

sed: extracting value of a key-value pair in a URL query string

I am trying to use sed to extract the value part of one of the many key-value pairs in a URL's query string This is what I am trying: echo 'http://www.youtube.com/watch?v=abc&g=xyz' | sed…
markvgti
  • 583
8
votes
3 answers

Import json data into Excel

I have a text file in json format and want to read it into Excel. A very simplified example of the json file has the following structure: { [ { 'a': 10, 'b': 20 }, { 'a': 20, 'b': 22 }, { 'a': 11, 'b': 24 } ] } I want to convert it to Excel…
8
votes
1 answer

Is the ".sha256" file format formally defined somewhere? How should it be parsed?

I see a bunch of FOSS projects which have ".sha256" files. They look something like this: dsdfdfdsffdfsdfdsfdsfdsfdsfdsfds23r2ewrefdefdsfdsgfdsgffgfkgdfgg *meow.exe Asdfdfdsffdfsdfdsfdsfdsfdsfdsfds23r2ewrefdefdsfdsgfdsgffgfkgdfg3 …
Iago B.
  • 81
  • 1
  • 1
  • 2
7
votes
4 answers

Extracting a list of values from JSON file to Excel or a text file

I want to extract usernames from a JSON data file. [{"username": "Cobra", "user_id": 146231486, "event_type": 2, "title": null, "class_id": 4211, "war_state" : null, "superpower_expire_date": 1441178060.0, "role": 3, "event_state": 2, "avatar_id":…
WR20
  • 71
7
votes
8 answers

Remove linux file named with set of shell responsive characters

I've created a file named \;:$"\' to test a software of mine. I ended up with an error, because I cannot delete the file property. I'm trying to find a precise character combination to remove it via rm, but I cannot find a way. rm \\;:$\"\\\' rm:…
Alex
  • 227
6
votes
3 answers

Breaking a file down of strings, into separate files each based on the first letter. BASH

Alright, so I have a file full of thousands of strings. Each one on it's own line. I want to make a script that will allow me to take this file, call it list.txt, and take the items from each line, and place it into separate files based on the first…
Josiah
  • 1,794
6
votes
2 answers
6
votes
1 answer

Machine readable list of files in rar archive

I need a way to get a parsable list of all files in a .rar archive without extracting them, in bash or python. What I've tried: rar l *.rar 7z l -slt *.rar I've also looked at patool in python, but it seems to be just a thin wrapper over rar. I…
5
votes
3 answers

How append a string at the end of all lines?

I'm trying to add a string at the end of all lines in a text file, but I have a mistake somewhere. Example: I've this in a text…
rkifo
  • 53
4
votes
1 answer

How do I parse file paths separated by a space in a string?

Background: I am working in Automator on a wrapper to a command line utility. I need a way to separate an arbitrary number of file paths delimited by a single space from a single string, so that I may remove all but the first file path to pass to…
4
votes
5 answers

How to split a text file into multiple text files

I have a text file called entry.txt that contains the following: [ entry1 ] 1239 1240 1242 1391 1392 1394 1486 1487 1489 1600 1601 1603 1657 1658 1660 2075 2076 2078 2322 2323 2325 2740 2741 2743 3082 3083 3085 3291 3292 3294 3481 3482 3484 3633…
Andrew
  • 726
  • 6
  • 15
  • 27
4
votes
1 answer

How can I parse an XML file from the command line (for GeekTool)?

I'd like to find a Terminal command that can pull the file at http://api.twitter.com/1/statuses/user_timeline.xml?screen_name=SOMEUSERNAME&count=1 and parse it to find a user's Twitter status. The status is inside the "statuses -> status -> text"…
stalepretzel
  • 1,176
1
2 3 4 5 6 7 8