Sunday, January 29

Distributed Search Engines

The recent controversy about Google limiting search results to people in China has highlighted that search engines are very influential, and subject to political and commercial pressures. They also mean that everyone has to go through a single point, which is vulnerable to attack or system failure.

This blog entry suggests how a search engine could be distributed across many machines. Each user would search a small part of the web, and the results would be shared. In some ways this is similar to recent trends for downloading music. People wishing to download music previously may have used a service such as "napster", which centrally maintained a list of songs. This was closed down due to its widespread use for infringing copyright, to be replaced by "peer to peer" systems, which are proving much harder legally and technically to close down.

There isn't any technology that is likely to be difficult to implement technically, and there are 3 components, an agent to create a local search index, a service to respond to requests for searches and a search results viewer to search the Internet and return the results.

Creating a local search index

You could run an agent on your machine. This will examine the pages you visit and index them. It will follow any pages linked to from these pages, and index these as well. The depth of spidering will be recorded, so pages that you visit are considered most important, and pages several clicks away less so.

The amounts of pages that will be indexed will depend on available bandwidth and processor resources. They will also depend on disk space. When disk space runs out, older pages are removed. Indexing pages should be a background task that does not perceivable affect computer performance.





Responding to requests for searches

You will need to run a service on your machine which will listen on a port. Someone can connect to the service and specify a search term. If they do, the following are returned to the user:

a) Summaries of the top 10 pages in your index will be returned to this user.
b) A list of your search partners.

When someone connects to your machine, their IP address will be logged and recorded in your “search partners” list. This will mean you can search their machine in future.





Search Results Viewer

When you want to search, you will run a desktop application, which displays a list of search results.

The first thing this will first look at your machine for pages matching your search criteria. It will then ask machines in your list of “search partners” if they have any pages matching your criteria. As pages are returned, they are added to the list displayed in the application, and the list is reordered to place the best matches (those that most accurately match your search words and those that are returned by lots of people) first.

It then recursively asks machines in the lists of search partners of your search partners, until it deems enough results have been returned or it has been trying long enough. If a partner of a search partner returns a result, it is added to your list of search partners.





Conclusion

As this will index pages that are actually visited, search results are more likely to be relevant. It will be more difficult for political or commercial pressures to influence search results, which is especially important as the internet becomes more widely available in countries with less than perfect records on free speech. The positions of sites will differ between searches, depending on which search partners could be contacted. This will mean people see a wider range of web sites, making it easier for smaller sites to become noticed.

Related Links

Hundred Dollar Laptop

Tuesday, January 24

Tab Orders

I've got an application with lots of dialog boxes. These have evolved over time and controls have been added and removed.

As it's coming to the stage where it will be released, it's time to tidy up things like tab orders. I found the following class from Scott McMaster saved a lot of time:

http://www.codeproject.com/dotnet/TabOrderManager.asp#xxxx

All I needed to do was add one line of code to each dialog box, and the tab orders now work correctly.

Monday, January 23

Toyota's Values vs. Agile Methodologies

An article in this weeks Economist reminded me of how the Japanese car maker Toyota has a culture with five distinct values. Three of these sit well with agile. I think agile can learn from one, and Toyota should update one of these to reflect learnings from agile.

Kaizen (Continuous Improvement) – Each day Toyota Employees come to work determined to become a little better at whatever it is they do.

This sits well with agile techniques, such as refactoring and pair programming. You are more likely to become better at whatever you do if you constantly learn from others and reflect on things that have gone right or wrong in the past and try and improve them.

Genchi genbustu (Go to the source) – Find the facts and do not rely on hearsay. Gain consensus around well-supported arguments.

This sits well with agile methodologies, where the customer is closely integrated with the team. The closer the customer is to the team, the less likely it is that facts will be distorted between the customer and the developer.

Challenge – Problems are a way to improve performance, not something undesirable.

This too sits well with agile methodologies. If you do something that takes longer than expected or doesn't work out, you can try a different approach more easily than if you have to do something in a particular way to conform to a rigid framework. You can then reflect on what worked and what didn’t to improve performance.

Teamwork – Put the company’s interests before the individual and share knowledge with the team.

This may appear a great goal, but it doesn’t sit well with the concept of "drawing your own map", where each individual thinks what they want from their career. Perhaps this should read "Align the company's interests with the individuals goals." If both sides can make compromises, the company will retain happier people who will ultimately be more profitable.

The other part of teamwork – "Share knowledge with the team" certainly sits well with agile.

Respect – Respect others skills and knowledge. If two people always agree, one of them is superfluous.

Some environments that use agile methodologies seem to try and get everyone thinking in the same way (e.g. Test Driven Development is good, Java is good, Visual Basic is bad), and it's difficult to suggest something else. Agile needs to encourage more diverse thought and to regularly revisit these ideas. It should consider itself a science, not a religion.

Related posts

"Agile followers and Agile Derivers"
http://www.richardjonas.com/blog/2006/01/
agile-followers-and-agile-derivers.html


Economist article – Inculcating culture – The Toyota Way http://www.economist.com/displaystory.cfm?story_id=E1_VPRDQGN
(this may not be available to non-subscribers).

Draw your own map - http://redsquirrel.com/dave/work/
a2j/patterns/DrawYourOwnMap.html

Friday, January 20

Excel Column Selection

I've found a problem with selecting columns in Excel.

I tried using the following VBA script:

Columns("K:L").Select

This should select columns K and L. However, this selected columns F to W.

This was because there were merged cells on the spreadsheet. Cells F4 to W4 were merged. Removing these allowed the columns to be selected.

Wednesday, January 18

Agile Followers and Agile Derivers

Dave Churchville describes the split in the agile community between "followers" and "derivers". Followers adopt all the rules and best practices that they hear about. Derivers consider the basic principles of rapid feedback, inspection and adaptation, and select practices accordingly.

I agree with Dave that the principles of agile are more important than the practices, and you should consider your environment when selecting which practices to apply. You should also update your practices regularly. Agile should have adaptable practices as well as tasks.

I would suggest the following four factors are triggers that the process should be reassessed and possibly adapted :-

Changing People

Have the personnel on the project changed. What are their personal goals, and does the methodology match these goals?

Changing Technologies

Has the project moved to a stage where it uses different technologies? – if so, which agile methods are most appropriate for these? Practices that are suitable for developing web pages in ASP.NET might be different to practices for developing applications to connect to a legacy mainframe system.

Changing Customers

Have the customers on the project changed. Are they still happy to work with developers?

Changing Environment

Has the political or legal environment changed?

Sunday, January 15

Arguments for and against using stored procedures

Frans Bouma states some arguments against using stored procedures, preferring to generate SQL dynamically in an application.

Coming to databases from an OO background, I'd suggest that the database should be considered an "object" with a defined interface. The implementation should be separate from the interface, and coupling between objects should be reduced as far as possible, and I’d use SP's mainly for this reason.

Ed Tittel suggests that technologies should be evaluated by considering performance, availability, security, scalability, maintainability, accessibility, deployability and extensibility. I think considering these criteria, SP's have advantages.

Performance – There may be some small improvements with SP's, as these can be pre-compiled. However, newer versions of SQL server maintain a cache of execution plans of other queries, so any gains are likely to be small. You certainly don’t lose anything with SP's however.

If you decide your application is too slow, and want to cache data to optimise things, you can do this in the database, and don't need to change your application.

Availability – N/A

Scalability – Separating the implementation from the interface allows the implementation to change. If, for example, you wanted to partition your data so it appeared across several servers, you could change your tables, change your stored procedures to use the new tables, but if you keep the interface to the stored procedures the same, you will not need to change your applications.

However, the more proprietary database technology you use, the more you are locked in to Microsoft technologies. Microsoft have some impressive performance figures for SQL Server on large-scale environments. However, sometimes when scaling a database up, you might want to move to something else, and using SP’s would make this change take longer.

Security – If the underlying tables are hidden, the risk that they can be accidentally or maliciously changed is reduced. Access to stored procedures can be granted to individuals or application roles who need it. However, they should not be seen as a technique to magically prevent SQL injection attacks.

Maintainability – It is probably easier to make small changes to an application that uses SQL to query the database directly. However, it is harder to ensure these changes will be correct and reliable. Using stored procedures allows Test Driven Development and refactoring to be used (see my earlier post here ).

However, many developers do not understand how stored procedures work, and may develop them sub-optimally. You need to consider who is available to change these if they need to be maintained and what their skills and aspirations are.

Accessibility – N/A

Deployability – If the database is a self contained object, with a limited number of methods (stored procedures), it is less risky to redeploy this object in isolation. The more ways that code can access the database, the more chance that something could go wrong, and the easier it will be to correct any problems.

Extensibility – Using SP's allows your application to be extended more easily without damaging existing functionality. If your application needs different information from a database, you can easily write another SP. You can run unit tests over your existing procedures to make sure that anything you have written does not break existing code.

If you want to write another application that uses the same data, you can reuse the SP’s, and any optimisations you have made to them.

Douglas Reilly has also looked at this here and has concluded that SP’s improve maintainability and performance (to a smaller extent).

I would use SP's at all times except where the following apply:
1) The database technology might change.
2) Developers and maintenance personnel do not understand SP's and are not able or willing to learn about them.
3) You are developing a very small project.
4) The project is a one-off development and unlikely to need maintenance.

Tuesday, January 10

Power, Bureaucracy, Relationships and Achievement

Esther Derby writes about how corporate culture can influence how successful agile processes could be. She describes 4 types of culture:- power, bureaucratic, achievement and relationship. Agile processes are most successful in achievement and relationship cultures.

I think the culture in an organisation often follows a cycle. Initially people want to achieve something. They may not achieve things as fast as some people perceive they can, so management takes control and introduces a "power" culture. The safe option is seen to be to write processes and procedures, leading to a "bureaucratic" culture. Eventually, people work together to work around these, as they feel that the processes don't help them get anything done. This leads to people feeling energised and wanting to achieve something, as shown below:





To help agile methods succeed, you need to keep the culture on the left of the diagram. To reduce the likelihood of a "power" culture being introduced, everything needs to be visible, so management can see that things are being done as efficiently as they can be.

If you're in a bureaucratic culture, you need to move it back to the "relationship" culture as quickly as possible. For this to happen, people need to feel that procedures can be challenged and that they won’t come to any harm if they do so.

Monday, January 9

People aren't fungible resources

I saw the following web site: www.poppendieck.com/overview.htm.

I'm concerned that the Microsoft Solutions Framework and Visual Studio 2005 Team System categorise people into interchangeable "roles" and not as individuals with their own interests and aspirations.

Reducing a project into a series of tasks to be undertaken by an interchangeable "developer" and a series of tasks to be undertaken by a "tester" may seem efficient, but will mean that a developer undertaking a task does not develop an understanding the project and hence ensure it is as good as it can be for its users.

People also feel happier if they are in control of what they are doing, not being controlled. Happier "developers" care about their work and how it affects users.

It's important to know what is going on, but scheduling systems should track both the progress of a project and the progress of people towards their personal objectives, and then help suggest a direction, not control. Control may speed up initial development, but the overall quality will fall, leading to greater testing and support costs later.

Friday, January 6

Using Excel to create maintainable applications

It's occasionally easier to write an application using an Excel spreadsheet than using a conventional language. A spreadsheet is easy to send in an email and you don’t have to worry about installing things.

However, it's very easy to let normal good practices go out the window, and develop something that's impossible to maintain. Often a few macros are added to a spreadsheet that eventually becomes a full-blown application. Here's a few ways I've found to try and make something that is maintainable.
  • Treat each worksheet as an "object". As with any other object oriented design, there should be minimal coupling between objects. If there’s too much coupling, things get difficult to maintain very quickly.

  • It's easy to use copy and paste for code reuse. If worksheets have common functionality, this can be extracted into separate modules.

  • Try and create a 2 or 3 tier structure, with worksheets that contain data, worksheets that perform calculations on the data and worksheets that display the data to the user. Have some worksheets that contain "data", and others that perform calculations and apply presentation to the data. Eventually, you may decide to use a conventional language and migrate to this, and this will make things a lot easier.

  • Develop unit tests to call functions, and a separate worksheet to run these. It's possible to develop these first.

Patrick R. O'Beirne has some other thoughts on this at: http://www.exceluser.com/tools/agile1.htm

Do any readers have any other good practices for using Excel to create maintainable applications?

Tuesday, January 3

Elimination of the file system

Jeff Atwood writes that the file system should be eliminated, or at least hidden from the user. I agree with this, at least for some types of documents, such as word processor or spreadsheet files.

It’s not difficult to arrange a system to make it easier to find documents and keep track of versions in a corporate environment.

The current options on a file menu would be replaced by the following:

1) Save
2) Load
3) Email
4) Copy to
5) Revert to previous version

Save asks you for the following:

1) Who is allowed to read the document.
2) Who is allowed to change the document.
3) Optionally, tags to make it easier to find the document. These will typically be needed for things that can’t easily be indexed such as pictures.

It does not ask for a file name or directory. It then records the document, keeping all existing versions (disk space is less than 30p/GB now). This will be stored on a server drive somewhere, but the details of the file name don’t matter to the user. If it’s more efficient to store files in e.g. a database or combine documents into the same physical file, then the system could do this transparently.

Load asks you for some keywords, and remembers keywords you have used and others have used. It then presents a list of files that you are allowed to open in a similar format to a search engine. The search should be weighted by

1) How closely the keywords match your document.
2) How recently edited the document is.
3) How close a user is to you. Something written by someone in your department would appear higher in the list than something written by someone the other side of the world. Weightings could be calculated automatically from how often you send and receive emails from that colleague.

The list would include who last edited the file, and when it was edited.

It will then load the most recent version of that document, avoiding the risk that different people are reading different versions, and users will not have to learn any document management system.

Email will let you copy the document to an email message. The system could keep track of this, so if someone were to leak a confidential file, the person responsible could be found.

Copy to will let you copy the document to another device, e.g. a USB drive. Again, the system could keep track of this.

Revert to previous version will show the different versions of the document and allow you to go back to a previous version.