I am working with google english 1gram dataset link here, it looks like the following:
C'ape   1804    1       1
C'ape   1821    1       1
C'ape   1826    1       1
C'ape   1838    2       2
C'ape   1844    1       1
C'ape   1869    1       1
C'ape   1874    1       1
C'ape   1878    2       2
C'ape   1879    1       1
C'ape   1880    1       1
CABMEL  1873    1       1
CABMEL  1874    1       1
CABMEL  1875    1       1
CABMEL  1879    1       1
CABMEL  1884    1       1
CABMEL  1890    1       1
CABMEL  1899    1       1
CABMEL  1901    1       1
CABMEL  1903    3       2
CABMEL  1910    2       2
CABMEL  1912    1       1
CABMEL  1915    1       1
CABMEL  1926    2       2
CABMEL  1927    3       2
CABMEL  1928    4       2
CABMEL  1930    2       2
At least 4 columns, and some rows also contain 5. First column is a 1-gram, a string, I want to extract only those lines which have a string in first column that only contains letters (upper case or lower case alphabets only). I am thinking grep should do it but I cannot find the correct regex to do this job. Any unix utilty that can easily get the job done? Columns are tab delimited I believe.
EDIT: Output will contain only the lines with CABMEL
 
     
    