DataTable Performance

Using (or providing) Microsoft.NET Classes

DataTable Performance

Postby Azal on Wed Jan 13, 2010 12:10 pm

To convert IEnumerable instances to APL array I noticed pure performance. For example getting an array of values from System.Data.DataTable by executing:
⊃(⌷dt.Rows).ItemArray
can take a hour where DataTable has essential number of rows (I check with AdventureWorks database, table [Sales].[SalesOrderHeader] has ~30000 rows with 27 columns).
I try to do the same in Visual Studio (Vb.Net) and received a Array of System.Objects within half second. Try to compile a VB code as assembly don't give the same result by calling VB function from Dyalog APL. By my opinion issue is not in looping of values but in obtaining an array of instances from .Net to APL environment.

Any suggestions?
Azal
 
Posts: 11
Joined: Tue Jan 12, 2010 3:22 pm

DataTable Performance

Postby Morten|Dyalog on Fri Jan 15, 2010 8:35 am

Bringing a large number of objects into the workspace (not only an object per row, but perhaps dates or other objects in the data cells) does not perform terribly well. We have some ideas for generally improving this, but they won't get implemented for some time. What we HAVE done is to develop an I-Beam which targets DataTable objects specifically, and can be used to convert objects to strings as they come in (and vice versa when they go out). This improves performance by a couple of orders of magnitude. The new I-Beam has been distributed as a DSS patch to v12.1 (for a customer who needed it), I now need to spend a little time beating the documentation into shape, hope to upload it within a day or two.
User avatar
Morten|Dyalog
 
Posts: 453
Joined: Tue Sep 09, 2008 3:52 pm

DataTable Performance

Postby Azal on Fri Jan 15, 2010 12:24 pm

I investigate issue more deeply and found major problem - only .Net instances slowly retrieving from CLR environment. As I understand Dyalog .Net interface has internal conversion for some CLR types. System.String - to character vector, any numeric (except System.Decimal) to numeric values. I decide to do my own shared function in Vb.Net and it work well (System.DateTime -> IDN format, System.Decimal -> in APL numeric values, System.Guid -> to character vector...). It work well and performance increase up to hundred times, but I still have last issue - System.DBNull. I can't find VB value to be ⎕NULL in Dyalog APL. If it VB "Nothing" APL receive .Net object and spoil idea. I try Nothing, vbEmpty, vbNull, System.Runtime.InteropServices.VarEnum.VT_NULL and VT_EMPTY - not help:-(. My opinion Dyalog.Net interface has to translate Nothing to ⎕NULL, it is logic.

To translate all data to strings is not best idea. At least most of CLR simple types has a good alternative in Dyalog APL. Much better to has values 1 then '1'. DateTime can be or IDN with fractional time (DateTime.ToOADate method) or numeric vector YYYY MM DD HH mm SS.SSS like receiving from ADO.Recordset.GetRows. Guid as string or numeric vector of bytes. Better to give us a choice how to convert with let say vector of translations for example like:
Matix<-{'' 'IDN' 'Bytes'} I-Beam DataTable, where empty value mean default conversion.
Azal
 
Posts: 11
Joined: Tue Jan 12, 2010 3:22 pm

DataTable Performance

Postby Azal on Fri Jan 15, 2010 7:48 pm

>still have last issue - System.DBNull
Find a solution - assign a System.DBNull.Value for System.DBNull "cells" in DataTable. In this case no .Net objects bringing to APL and result matrix are search-able by System.DBNull.Value to determine a Null values.

Morten, if to retract from table of object's members which are bringing together with instances it will give any essential performance result?
Azal
 
Posts: 11
Joined: Tue Jan 12, 2010 3:22 pm

DataTable Performance

Postby Morten|Dyalog on Mon Jan 18, 2010 6:25 pm

The I-Beams that we have developed for DataTables (I hope to post the documentation tomorrow) allow you to specify which value to translate DBNulls to (typically 0 for numeric and '' for character columns). This is based on purely pragmatic assumptions about what is "most likely to be useful" to an APL application receiving the data. The same option is available when sending data back to a DataTable.

The key to achieving high performance is to avoid moving large numbers of objects between the .NET Universe and APL.
User avatar
Morten|Dyalog
 
Posts: 453
Joined: Tue Sep 09, 2008 3:52 pm

DataTable Performance

Postby Morten|Dyalog on Thu Jan 21, 2010 6:33 pm

It took a bit more than the day I was hoping for, but the first "Dyalog Technical Note", describing the DataTable I-Beams, is now ready. It will be up on our web page shortly.
User avatar
Morten|Dyalog
 
Posts: 453
Joined: Tue Sep 09, 2008 3:52 pm

Re: DataTable Performance

Postby Morten|Dyalog on Tue Jan 26, 2010 6:54 am

The PDF has been recreated, there were problems copying and pasting APL statements from the earlier one.
Attachments
DNOTE 1 DataTable I-Beams.pdf
(587.88 KiB) Downloaded 836 times
User avatar
Morten|Dyalog
 
Posts: 453
Joined: Tue Sep 09, 2008 3:52 pm


Return to Microsoft.NET

Who is online

Users browsing this forum: No registered users and 1 guest