Sunday 24 March 2013

Reading Blobs from SQL Server in chunks

There are many reasons why we want to store blobs in SQL Server, instead of a path to a file. Security  might be a very good reason.

We can use the varbinary type in SQL Server to store blob data. 

So, lets say that we managed to get some blob data to SQL Server. We can simply use ADO.NET, Entity Framework to read this data. This task is straight forward. Now, consider the scenario where there is a large blob, and we want to read it in chucks. We might do this in order to optimise the application performance (streaming), network traffic. We do not want our applications to "hang", while blobs are being read from SQL Server!

I have created a very simple table, and the schema looks like below: 



Lets get some binary data to our table:


Ensure to replace the path to a file in your system. 

I will be using Visual Studio Express 2012. Instead of just showing the code, I would like to highlight the most important points. 

The first stage is to make a connection to the database. For this, we can simply use the SqlConnection class. Thereafter we will use SqlDataReader class to read the data. The following is the first part of the class.


Don't worry too much about this, just keep an eye on the SqlConnection and SqlDataReader. In the constructor, I have created an instance of SqlConnection with a connection string. Again, you will need to set your own connection string. 

The next point is the "CommandString". This is the SQL I will be using to read data from our table. The keyword that stands out from this string is "DATALENGTH". We use DATALENGTH  to find out the size (length) of the byte stream stored in the "Payload" column.

The DATALENGTH function returns the number of bytes in the "Payload". We will be reading 1MB at time and I have created the "ChunkSizeBytes" variable to store the maximum size of the chunk. 

We can now open the connection the database. 

In the above method, I have used SqlCommand class and has initialised it with the "CommandString" and the connection. Thereafter we can execute the command with the "SequentialAccess" command behavior. This is the most important piece of the puzzle. By using this behavior we are essentially configuring our connection not to read the entire blob, but read it as a stream. The next behavior is "CloseConnection", which closes the connection when the reader finished reading. 

Next step is to read the data.


The above method reads the "Payload" data in 1MB chunks. We use the "GetBytes" method to read the chunk. Yes, there are many optimisations that you can do to this method. The first is not to "re-initialise" the "bytes" variable, but to reuse it. 
What is basically happening here is that, the "callback" delegate is invoked for each chunk. 

 The Console application looks like below:


Nothing major, I simply write the bytes to a file. This is what I got when I execute the application.


You can see that the delegate was called 5 times and we did not receive the full 1MB in the last attempt. 


We can see that the file is only 4.6MB, and maps directly to the results we saw in the console application. 

My PDF works!


Enjoy! (BTW - code is here)