Friday, April 28

Refactoring Databases - Reducing the Risk

I am reading the book "Refactoring Databases: Evolutionary Database Design" by Scott W Ambler and Pramod J Sadalage at the moment.

The biggest difficulty with refactoring databases is caused by having multiple client applications (that you may or may not know about) accessing your database in ways that are inconsistent with good programming practice, such as using "select * from table" and referring to the columns returned by numbered parameters in your code. I wrote an article a few months ago about the strengths and weaknesses of using stored procedures.

If you want to remove a column, you therefore need to be sure to change all your client applications.

The risk of doing this could be greatly reduced if there were a way in which you could trigger on select statements. This would mean that if you performed a "select *", you could replace it with a select statement that returned the columns in the correct order and inserted a null column in place of the one you have just removed. You could also record the fact a select had been made to an audit table, and use this to help identify if you have changed all your client applications.

Unfortunately, you can only trigger on insert, update and delete statements. Dejan Sarka describes a workaround for this by using the profiler and defining a FOR INSERT trigger on the table that it generates. However, this slows things down, and with the increased use of agile practices, a way to trigger on select statements should be introduced.

Until databases support this functionality, a first stage of any refactoring should be to reduce the numbers of ways in which you can connect to a database, and creating an interface that is as small as possible. I discussed the arguments for and against using stored procedures in January. I would now be more inclined to use SPs to connect to the database. If there are a small number of entry points to the database, there are less things to test when you refactor, and it will be more likely you will keep your database schema up to date.

Friday, April 21

Relative click-through rates depending on your rank in Google

I came across this study, which describes the relative chance that a user will click on your link if they search in Google, depending on the rank it achieves in the search.

Rank 1 - 100%
Rank 2 - 100%
Rank 3 - 100%
Rank 4 - 85%
Rank 5 - 60%
Rank 6 - 50%
Rank 7 - 50%
Rank 8 - 30%
Rank 9 - 30%
Rank 10 - 20%

It's interesting that the first three positions rank equally (so there's no need to try too hard to get from position 3 to position 1), and that the percentages drop off quickly below that, when users have to start to scroll the screen to see your results.

Wednesday, April 19

Why does parallelism need to be so complicated?

Creating applications that use multiple threads in languages such as C# and Java is easier than in previous languages, but still quite a complex process. This means that people only use it if it's vitally important for a special case.

As clock frequencies in microprocessors are reaching their limits, they are increasingly using multi-core technology to gain increased performance. It is likely that in a few years time, processors may have 16-32 cores, but at similar clock frequencies as today.

This means that creating applications that do things in parallel needs to be easier, but programmers don't want to learn a new language.

If you have some code like


{
doA();
doB();
doC();
}
doD();


and you don't care what order the functions are executed in, could you use a new form of brackets?


{[
doA();
doB();
doC();
]}
doD();


This will execute all 3 functions. They may be executed in any order, sequentially or in parallel. How they are executed will depend on the compiler and operating system. It may create new threads, or if the overhead for this is too great it may not. When all 3 functions finish, the function doD() will be executed.

You may also have for loops, but don’t care in which order the iterations are performed.

If you have a for loop, this could work in a similar way.


for(int i=1;i<=10;i++)
{[
doE(i);
]}
doF();


doE() may be executed in any order. DoF() will be executed when all the calls to doE() are completed.

This will mean whenever you write code, you can think "Does it matter what order the statements are performed in?" If the answer to this question is "No", then using the new notation will (if the compiler and operating system deem it more efficient) perform them in parallel using separate threads.

If you later find the parallelism is causing problems, you can easily revert to the original sequential notation and the operations will be executed sequentially.

If you find code is running slowly, you can add the new notation and the operations may be performed in parallel.

Wednesday, April 5

Crystal reports and empty report

I've been trying to use Crystal Reports with Visual Studio 2005 and SQL Server 2005 Express. When I dragged the CrystalReportViewer control to a Windows Form and added a very simple report to this form, the report headings were displayed but no data. There were no error messages to explain why this was the case.

Reading around the net lead me to think this was likely to be with something to do with database security or firewall settings. However, after lots of trial and error I found that I had to do the following:-

1) Explicitly fill the datasets that were used in the report.
2) Go through each table in the report and set its data source.
3) Set the report source to the report with the modified tables.

The code I used was as follows:


private void MyForm_Load(object sender, EventArgs e)
{
this.myTableAdapter1.Fill(this.myDataSet1.
MYTABLE);
myReport crtp = new myReport();

foreach (CrystalDecisions.
CrystalReports.Engine.Table
tb in crtp.Database.Tables)
{
tb.SetDataSource(myDataSet1);
}

crystalReportViewer1.ReportSource = crtp;
}


This wasn't anything to do with security settings.

Monday, April 3

Refactoring Databases

It's often a good idea to put as much logic as possible in the database layer – if you don't, you have to transmit more information across processes and machines, and moving information is slow, can be unreliable and is time-consuming to program.

Unfortunately, once you have a database structure, it can be difficult to update it. I posted a while ago about how a test driven development approach could be used to give you more confidence that any changes you make would work and not break something you have already done.

I noticed today that Addison-Wesley have published a new book on the subject of refactoring databases by Scott Ambler and Pramodkumar Sadalage.
I've ordered my copy of the book, and there is a website at databaserefactoring.com that introduces the refactorings. I'll post a review when it's arrived and I've had a chance to read it.