I can't be the first person to have this issue but hours of searching Stack revealed nothing close to an answer. I have an SSIS script that works over a directory of csv files. This script folds, bends and mutilates these files; performs queries, data cleansing, persists some data and finally outputs a small set to csv file that is ingested by another system.
One of the files has a free text field that contains the value: "20,000 BONUS POINTS". This one field, in a file of 10k rows, one of dozens of similar files, is the problem that I can't seem to solve.
Be advised: I'm weak on both C# and Regex.
Sample csv set:
4121,6383,0,,,TRUE
4122,6384,0,"20,000 BONUS POINTS",,TRUE
4123,6385,,,,
4124,6386,0,,,TRUE
4125,6387,0,,,TRUE
4126,6388,0,,,TRUE
4127,6389,0,,,TRUE
4128,6390,0,,,TRUE
I found plenty of information on how to parse this using a variety of Regex patterns but what I've noticed is the StreamReader.ReadLine() method wraps the complete line with double quotes:
"4121,6383,0,,,TRUE"
such that the output of the regex Replace method:
s = Regex.Replace(line, @"[^\""]([^\""])*[^\""]", m => m.Value.Replace(",", ""));
looks like this:
412163830TRUE
and the target line that actually contains a double quote delimited string ends up looking like:
"412263840\"20000 BONUS POINTS\"TRUE"
My entire method (for your reading pleasure) is this:
        string fileDirectory = "C:\\tmp\\Unzip\\";
        string fullPath = "C:\\tmp\\Unzip\\test.csv";
        string line = "";
        //int count=0;
        List<string> list = new List<string>();
            try
        {
            //MessageBox.Show("inside Try Block");
            string s = null;
            StreamReader infile = new StreamReader(fullPath);
            StreamWriter outfile = new StreamWriter(Path.Combine(fileDirectory, "output.csv"));
            while ((line = infile.ReadLine()) != null)
            {
                //line.Substring(0,1).Substring(line.Length-1, 1);
                System.Console.WriteLine(line);
                Console.WriteLine(line);
                line = 
                s = Regex.Replace(line, @"[^\""]([^\""])*[^\""]",
                                        m => m.Value.Replace(",", ""));
                System.Console.WriteLine(s);
                list.Add(s);
            }
            foreach (string item in list)
            {
                outfile.WriteLine(item);
            };
            infile.Close();
            outfile.Close();
            //System.Console.WriteLine("There were {0} lines.", count);
        }
        catch (Exception e)
        {
            Console.WriteLine(e.Message);
        }
        //another addition for TFS consumption
    }
Thanks for reading and if you have a useful answer, bless you and your prodigy for generations to come!
mfc
EDIT: The requirement is a valid csv file output.  In the case of the test data, it would look like this:
    4121,6383,0,,,TRUE
    4122,6384,0,"20000 BONUS POINTS",,TRUE
    4123,6385,,,,
    4124,6386,0,,,TRUE
    4125,6387,0,,,TRUE
    4126,6388,0,,,TRUE
    4127,6389,0,,,TRUE
    4128,6390,0,,,TRUE
 
     
    