printf(Hello World!\n)
tech blog

views: 1110

Streaming NULL values to PostgreSQL

03-13-2020

This particular issue caught me by surprise. I was pushing data to a table with an optional NULL column from a CSV, using pqxx::stream_to(). When generating the CSV file for the push, I decided to let the blank string “” stand in for my NULL values.

sample data:

id      name    tag             number  conurb  is_public                                                                   
1       Boring  BORING_OR       200             f                                                                      
2       Glide   GLIDE_OR        201     1001    f

When using straight COPY, this worked perfectly. After converting to stream_to(), however, I encountered the following error:

ERROR:  invalid input syntax for integer: ""
CONTEXT:  COPY city_table, line 1, column conurb: ""

Here’s the code:

std::string cityFile = pathJoin(outputDir, "cities.csv");
if( fileExists(cityFile) ){
   std::string file = cityFile;                                                                                                                                            
   cout << "\t* " << file << endl;                                                                                                                                         
   pqxx::work txn(conn);                                                                                                                                                   
   CSV csv("\t");                                                                                                                                                          
   if( csv.loadCSV( file, citiesCSVHeader, true) ){                                                                                                                      
      pqxx::stream_to stream( txn, "city_table", citiesCSVHeader );
      const std::vector<CSVRow>& rows = csv.getRows();
      for( auto &row : rows ){
         stream << std::make_tuple(row[0], row[1], row[2], row[3], row[4], row[5]);
      }
      stream.complete();
   }
   else{ cout << "\tunable to load cities file: " << file <<endl; }                                                                                                 
   txn.commit();
} else { cout << "No cities file." << endl; }

Here, I’m using an internal class to convert each line from the CSV file into a CSVRow (basically a vector of strings). So stream << std::make_tuple( row[0], row[1], row[2], row[3], row[4], row[5]); is taking six string values and pushing them to the database. Somewhere under the hood, the values for columns 1, 4 and 5 are getting converted to integers (and column 6 into a boolean type).

The problem is, it doesn’t seem to be converting the empty string to NULL in the same way that COPY does. I tried several work-arounds, like using “NULL” or “\N” (PostgreSQL’s standard NULL string for text streaming) in place of “” in the file. That didn’t work.

I knew that PostgreSQL allowed you to specify a custom null string when using COPY. pqxx::stream_to(), however, does not allow you to set that option when creating the stream.

Finally, I reached out to the (very active!) libpqxx team over at GitHub, and got a quick response. They suggested I tried passing a NULL value to the tuple, or try using std::optional.

Just to test, I checked that NULL worked as expected:

for( auto &row : rows ){
    stream << std::make_tuple(row[0], row[1], row[2], row[3], NULL, row[5]);
}

. . . and confirmed that the push worked as expected.

Then I wrote a helper method that I could wrap around any column values that could be NULL:

std::optional<std::string> nl(std::string s) { 
    if (s != "") { return s; } 
    else { return {} }
}

I hadn’t used std::optional before, but it was perfect for this. According to documentation at cppreference.com:

The class template std::optional manages an optional contained value, i.e. a value that may or may not be present.

That meant I could return NULL in place of the empty string and allow the stream to move forward as expected. Here’s the finished code:

std::string cityFile = pathJoin(outputDir, "cities.csv");
if( fileExists(cityFile) ){
   std::string file = cityFile;                                                                                                                                            
   cout << "\t* " << file << endl;                                                                                                                                         
   pqxx::work txn(conn);                                                                                                                                                   
   CSV csv("\t");                                                                                                                                                          
   if( csv.loadCSV( file, citiesCSVHeader, true) ){                                                                                                                      
      pqxx::stream_to stream( txn, "city_table", citiesCSVHeader );
      const std::vector<CSVRow>& rows = csv.getRows();
      for( auto &row : rows ){
         stream << std::make_tuple(row[0], row[1], 
                                   row[2], row[3], 
                                   nl(row[4]),
                                   row[5]);
      }
      stream.complete();
   }
   else{ cout << "\tunable to load cities file: " << file <<endl; }                                                                                                 
   txn.commit();
} else { cout << "No cities file." << endl; }

And now the code works as expected!

Disadvantages to this approach: I will need to remember to wrap all values that *could be* null in this helper function, or risk my database push breaking during run time. I could wrap everything in the nl() function, I suppose, and perhaps I will make that change.

As always, if this helps just one person out there, the hour or so I spent on this will be well worth it! Special shout-out again to the libpqxx team over at GitHub — you guys are the best.