I have been provided a data file in a format I have never seen. The data do not appear to be in columns, but rather in one long row. I can open the file in Notepad and see the data. So, the data do not appear to be encrypted.
When I open the data file in Notepad the row of data wraps back to the to left side of the Notepad window when I guess the data reach the maximum number of characters that Notepad allowed in a single row, and then the data continue in a new row.
There might be 10,000 rows of data when I open the file in Notepad. The data in one of these rows are not aligned with the data in the row above it or below it.
Here are some example data:
40001 1 5 GGGG 2998 HHHH SU111111 95 1.0 F1 4 1304 3 0 0
40001 1 5 GGGG 2998 HHHH SU111111 95 1.0 F1 4 0205 0 3 0
40001 1 5 GGGG 2998 HURG SU111111 95 1.0 F1 4 0805 0 2 0
40001 1 5 GGGG 2998 HHHH SU111111 95 1.0 F1 4 1205 0 2 0
40001 1 5 GGGG 2998 HHHH SU111111 95 1.0 F1 4 1505 0 0
40002 2 8 GGGG 2998 PPPP SK777777 -999 1.0 F3 4 2003 0 0
40002 2 8 GGGG 2998 PPPP SK777777 -999 1.0 F3 4 2303 2 0 0
40002 2 8 GGGG 2998 PPPP SK777777 -999 1.0 F3 4 2703 3 0 0
40002 2 8 GGGG 2998 PPPP SK777777 -999
Notice that when I paste the example data here, representing one row in Notepad, the columns are 'magically' aligned.
I have found that I can open the data file in Excel and the data are also aligned. I do need to manually assign column boundaries in Excel however. And Excel does not allow me to assign a column boundary beyond more-or-less Character Space 123.
Below is SAS code to read the data file, although this SAS code does not work correctly. Rather I guess this SAS code skips some of the data rows. Notice that the variable TT covers character spaces 125-207, but that there are only 120 characters in most rows. There are more than 120 characters in some rows. This difference in the number of characters among rows I suspect is the reason SAS cannot read this data file correctly.
option linesize = 210 ;
option pagesize = 30 ;
FILENAME myinput 'C:/Users/markm/simple SAS programs/mydata.new' ;
DATA mydata ;
INFILE myinput ;
INPUT
AA 2-9
BB 12-17
CC 18-22
DD $ 24-27
EE 30-33
FF $ 35-38
GG $ 40-47
HH 53-56
II 59-64
JJ $ 66-68
KK $ 70-71
LL 72-78
MM 79-85
NN $ 87-90
OO 91-95
PP 97-104
QQ 105-110
RR 112-120
SS $ 122-123
TT $ 125-207 ;
If I move the cursor to the right one character at a time over the first row of data using the right-arrow key I have to press the right-arrow key twice to move beyond character space 120 in Notepad.
All of this is telling me there are hidden characters in the data file used to identify the end of a line of data.
I opened the data file in Vim hoping to see these hidden characters, but did not see anything. Vim did align the columns correctly when I opened the file. So, Vim must be seeing these hidden end-of-line characters.
How can I see these end-of-line characters myself? I suspect there is an option in Vim to reveal the hidden characters.
How can I determine the application that created this data file?
How can I modify the above SAS code to read this data file correctly?