Building Database-Backed Web Applications: Process and Issues
In this article, the process of implementing database-backed Web applications from initial planning to final implementation and beyond will be explored. Both technical and "people" issues will be addressed, including choosing hardware and database platforms; selection of development tools and approaches; making decisions on the allocation of time, personnel, and equipment; and assigning data and system administration tasks. 1 It is necessary to consider these points, even if a project seems simple, to ensure that no issues have been overlooked. Carefully thinking through all aspects beforehand can minimize the number of problems one will encounter later.
There is a growing body of literature on database-backed Web pages, both on general topics and specific platforms. 2 The library press is also beginning to publish in this area, including articles by Antleman, Mischo and Schlembach, Kiehl, Summers, and Westman. 3 There are several clear advantages of using a database rather than relying solely on static HTML pages: (1) it allows one to focus on inputting rather than formatting data; (2) it reduces the time required to maintain Web pages; and (3) it permits the reuse of information in a variety of situations and to serve a multiplicity purposes. However, when implementing this technology, one needs to undertake careful planning and weigh the costs versus the benefits. One will need to address a number of issues and make a number of decisions. Knowing what to expect will help to ensure success.
Planning
Deciding What You Want
When beginning, it is important to define the desired goal of the project and determine whether a database is the appropriate tool for that goal. Database-backed applications generally:
- use structured data where output via different parameters such as subject or material type is desired;
- enable searching of the data by various parameters;
- manage constantly changing data; and
- allow the reuse of information in different contexts and pages.
Get Everybody on Board
Once it is determined that a database is the appropriate tool for the job, it is necessary to get buy-in from two areas, administration and staff, before proceeding. Database-backed Web projects can consume significant staffing resources, may require a substantial financial investment in the short-run, and involve ongoing support in the long run. It is therefore critical that the library administration understands and fully supports the project. To obtain this understanding and support, answers must be provided to a number of questions:
- What will be the benefit of this project? Answers can include saving time, providing better access to information, or allowing for more efficient workflow.
- What will the costs be in terms of money and staff time? What will need to be dropped to do this? What new hardware, software, and personnel will be needed? Can it be done with existing infrastructure?
- How is this project to be accomplished? What is the timeframe for its implementation? What are the steps to be taken and when should each step be completed? What expertise is needed? Does such expertise exist in-house or can it be acquired?
- How much time and effort will it take to maintain this application? How much data inputting and updating will be involved? Will it be worth the investment? What subsequent programming and technical resources will be needed?
It is also important to identify the persons who will fill various roles in the project. Note that one person may perform more than one role. Necessary functions include:
- Project coordinator-the person who oversees the project and makes sure things stay on-track
- Data users-the people who will help define the data and the functionality to be included in the application
- Designer-the person who finds out what data are needed and then creates the required data models
- Database administrator-the person who implements the data model and who administers the database management system (often the same person as the designer)
- Developer/programmer-the person who programs the application
- Testers-users who test the application 4
The project coordinator, by having the answers to these questions when approaching the library administration, can provide the administration with the information needed to make an informed decision. Having these answers makes certain that the coordinator has thought the issues through, thus ensuring the project's success.
Staff members also need to support the project. Since the new system will involve changes in their work, they need to understand the reasons for the changes and to support them. In particular, they need to be shown how this new application will make their work easier. By involving staff in the application from the beginning, they will have an investment in its implementation.
Think Big, Start Small
When starting the project, one should think big but start small. Thinking big means looking at the project from a larger perspective. What other uses might there be for the proposed database? Are there other types of information that could benefit from being made available in such a manner? If so, what are the relationships to the project under consideration? It is necessary to ask these sorts of questions to help assure that the project will work well with future products that the library might develop as well as permitting easy integration of new features into the proposed application at a later time.
Starting small means undertaking a project that realistically can be accomplished. This type of project may involve many people and will require planning, coordination, and communication. Staff will need to allocate time for evaluating, purchasing, and installing the hardware and software required to create the application. Starting small allows staff to define issues and to learn how to use and develop the tools and techniques needed for successful implementation. It also provides an idea of ongoing maintenance requirements and allows for gradual integration of the final product into the library environment.
Note that, while thinking big ensures the product will be flexible enough to support future development, it is important not to create false or unrealistic expectations. Getting people excited about possibilities that cannot be fulfilled can create frustration that could endanger the acceptance of the current application and possibly engender resistance to any new projects.
Once the project is defined, the project coordinator should establish a timeline to provide structure to the development process. This timeline should list each task to be done and indicate the order in which the tasks should be accomplished. It should also include realistic target dates with the names of those responsible for each task's completion. This schedule is valuable in making important time, money, and personnel allocations, as well as helping to keep the project on track.
Defining the System
The next step is to define the application. As Greenspun points out, this requires one to:
take [the] publishing idea . . . and apply three standards steps:
- Develop a data model-Define what information is going to be stored and how it should be represented
- Develop a collection of legal transactions on that model, e.g., inserts and updates
- Come up with a set of Web forms that let the user build up to one of those transactions. 5
Develop a Data Model
Data modeling is perhaps the most critical part of the entire process and one that is too often overlooked. The designer will save a lot of time and avoid many problems and frustrations by focusing on proper data modeling before attempting implementation. While a full description of data modeling is beyond the scope of this article, there are several good books on the topic. One of them is Hernandez's Database Design for Mere Mortals. 6 In clear, nontechnical language, he gives a step-by-step process for doing interviews to obtain needed data elements from users; organizing those elements into groups and setting up relationships between the groups; and defining the constraints on the data (rules to which data elements input into the system must conform). To assist the reader, he provides example scenarios at each step. Because his focus is on design principles rather than specific implementation or programming techniques, the book can be useful, no matter what infrastructure may be used for the project.
Briefly put, data modeling is the process of defining all of the data to be included in the system and then structuring them to allow their easy and efficient maintenance and retrieval. To obtain this information, the designer talks with staff members that will work with the data (both inputting and using) to find out what data are needed and how they are to be used. This process includes asking a number of questions, such as:
- What data are needed?
- How do the data interrelate and how should they be organized?
- What sorts of outputs (both Web pages and printed reports) are needed?
- Who should be able to enter and access this data?
- How can the design be made flexible enough to encompass future needs?
In the process, certain information about the data needs to be obtained. Questions that should be asked about each item include:
- What type of datum (text, numeric, date) is it? How big?
- What types of constraints should apply to it? For example, should users be able to enter anything for this item or should they select from a predefined list of possible entries?
- Should the user be able to search on it?
- Does it need to be repeatable?
- Is it mandatory? Optional?
- Should access be restricted? If so, to whom?
The designer then groups the data elements groups and defines how each group relates to the others. Once this has been done, the results should be verified with the staff to determine if there are mistakes or whether additional data elements need to be included. This process goes on iteratively until the designer feels that all of the basic pieces have been discovered and entered into the model.
As part of the modeling process, the designer needs to know how users will want to retrieve information from the database and what types of output are desired. What fields will they need to be able to search by? What fields should be displayed? Is all desired content included? Is it organized in a logical manner? Will the proposed model make it easy to retrieve the data? What sorts of reports will be needed and how will they be done? The designer needs to ask these questions both to optimize the model to support all desired searches and to validate the data structures that have been developed.
Define Interactions
Once the designer feels that a fairly complete model has been achieved, it is necessary to define the application processes needed to implement the system, and then to determine each step (interaction) within those processes required to get data into and out of the database. For each interaction, the designer defines the data elements involved and creates the structured query language (SQL) queries required to enter or retrieve information into or from wherever it resides in the proposed model. 7 This ensures that all data elements are present and validates (or show weaknesses in) the proposed model.
A process may involve multiple tasks. For example, adding a URL into a Web collection database may involve entering basic information for the site, assigning it subject headings, providing online licensing and support data on the resource, and entering public help-screen information. The designer may wish to break the process up and assign individual steps to different persons. In developing the system, one should ask:
- Can this process be broken down into individual tasks?
- Who will do each task?
- How do these tasks fit into the existing workflow?
- If a task doesn't fit in well, can it be changed? Can the workflow be changed?
- What security levels should be assigned to each interaction?
By breaking processes down in this way, the designer is able to build a high degree of flexibility into the system and permits tasks to be reassigned as library structures and workflows change.
Define the User Interface
Along with defining the interactions, it is necessary to design the HTML forms to be used in implementing each interaction within the application. These screens provide the primary window into the data; their design needs to be clear, allowing for easy navigation and use. Therefore, user input is crucial. The developer should create mockups of the application's inputting, editing, and searching/outputting screens and show them to users for feedback. User input can then be folded back into the screens' design. The process goes on iteratively until agreement is reached.
This iteration process may go back to any of the three steps listed above: data modeling, interaction definition, or screen design. The screen layout may change based on user input, allowing the user to view data and see a model of the final application elements and processes in ways that are more familiar. This then allows the user to see where additional data elements (or even processes) are required. Taking this approach ensures that the most complete data model is developed and allows for the building of an application that integrates well into the library workflow.
It is important that the designer and user(s) come to complete agreement on all three of these steps before proceeding with implementation. Adding further data fields or changing how the system works after this point could easily lead to significant delays due to the large amounts of time needed for data restructuring and additional programming or reprogramming. A generally accepted practice is that once agreement has been reached, a written document (containing documentation of the data models, transactions, interface screens) is created and all parties sign off on it. Any further changes are then subject to separate negotiations as to the possible effects of the proposed changes on the project's timeline.
Implementation
Select a Hardware Platform
For the most part, a full range of database and interface tools is available for both Windows and Unix platforms. While a library may already have hardware in place that could do the job, there are a number of points the project coordinator should consider when deciding on a hardware platform and determining whether existing equipment could be used:
- Servers must be adequate to support the new application. Issues such as disk storage space, required RAM, 24/7 availability, and supportability need to be considered.
- How many simultaneous users need to be supported? Larger numbers of users mean more multiple simultaneous queries, requiring more robust infrastructure.
- Is the proposed hardware scalable? Is it powerful enough to support future growth (both in terms of number of applications and number of users)?
- What kinds of data are involved? Graphic images require more bandwidth and utilize more storage space than text.
- Do the pages need to be created each time a user requests the page? Dynamic Web pages, while allowing for real-time availability of new information, can exact a serious performance price for often-accessed information on heavily used servers.
- How complex will the data structures be? If the database has a large number of tables with multiple relations, it will take more computer processing time to get the answer to an information request than if only a few tables are involved.
- What other applications are running on the same computer? What are their requirements?
In looking at these issues, one should note that the database management system does not need to reside on the same computer as the one that runs the Web server. Tools such as Database Interface (DBI), Open Database Connectivity (ODBC), Java Database Connectivity (JDBC), and others makes it possible to place the database on one computer and the Web server on another and then to establish communication between the two. 8 This allows the load to be distributed and permits the use of a different platform for the database than is used for the Web server.
When selecting a machine and operating system on which to run the database, there are some rules of thumb that the coordinator should consider when choosing between NT and Unix:
- Scalability-Either NT or Unix will be more than adequate for less complex or fewer numbers of applications. However, given that NT runs slower on a given Intel machine than Linux does, the latter is significantly more scalable (can support more activity) on a per-dollar basis.
- Affordability-Probably the most affordable solution would be a Pentium server running Linux. With the long tradition of sharing of Unix programs (particularly in the open source movement), one can easily assemble a fully functioning, fully featured database/Web server for around $1,000. NT on Intel is probably the next most affordable option. However, given scalability issues as well as licensing costs, NT will require a greater financial outlay.
- Staffing-NT is clearly the better-known operating system and, given the wealth of tools written for NT and the greater likelihood of finding NT expertise in-house, it probably has the edge in this area. However it is good to consider alternative approaches. In particular, the growing popularity of Linux means that it may not be as difficult as one might think to find Unix support at an affordable price.
- Ease of development-Traditionally this has been one of NT's strengths. However, with the increase in the number of development tools becoming available on both platforms, development on Unix/Linux servers is becoming easier.
- Security-Both operating systems need to be made secure, but there are tools to help the system administrator do so. While most flavors of Unix have a reputation for not being secure, the increasing number of exploits aimed at NT-based servers (particularly against Microsoft's IIS Web server) makes this point a draw. 9
In deciding which platform to use, one should balance the pros and cons in each of these areas, giving each the proper weight based on the needs and resources of the particular institution.
Select a Data Management System
Although there are numerous ways to store data, the use of a good relational database management system (RDBMS) is recommended. An RDBMS not only provides a means of storing the data, it also provides built-in mechanisms (APIs and SQL) for adding, editing, and deleting records, as well as searching and outputting data. 10 RDBMS products are available at three levels: free, workstation (inexpensive), and enterprise (expensive) database systems. 11 Table 1 provides a list of some of the more popular RDBMS products that can be used for Web database publishing. 12
Table 1. Database Products
| Product |
Wintel |
Unix |
| Free | ||
FrontBase |
Yes |
Yes |
MySQL |
Yes |
Yes |
mSQL |
No |
Yes |
PostgreSQL |
Yes |
Yes |
| Desktop | ||
FilemakerPro |
Yes |
No |
Microsoft Access |
Yes |
No |
FourthDimension |
Yes |
No |
MS SQL-Server |
Yes |
No |
| Enterprise | ||
DB2 |
No |
Yes |
Oracle |
Yes |
Yes |
Informix |
Yes |
Yes |
Sybase |
Yes |
Yes |
The project coordinator should consider several parameters when deciding which RDBMS product to use:
- What products are already available in-house?
- What can be afforded?
- What features are desired?
- What expertise is available?
- Is the desired product scalable? 13
When considering cost, one should look beyond just the price of the product in question. If certain features are needed, it may ultimately not be cost effective to purchase a less expensive product and then have staff program in the missing features. Not only could that end up costing a great deal more in terms of staff time, the features may not work correctly, creating problems that would not have existed had the more feature-filled product been chosen. In other words, the more expensive product may end up being less expensive in the long run.
When using an RDBMS, one needs to address the matter of who will administer the database. Database administration responsibilities include setting up and administering the RDBMS; creating databases and tables; and developing, implementing, documenting, and maintaining data models implemented within the RDBMS. Duties also include backing up (and restoring) databases and tables; importing and exporting data; ensuring database security and integrity; and working with programmers to get data into and out of the system. As the person who understands and maintains all the databases and how all of the data within them interrelate, the administrator is the one responsible for trouble-shooting any database problems that occur.
If it is decided that the scalability, flexibility, and features of an enterprise RDBMS are required, then an administrator who is able to support those features will be needed. This person would be responsible for properly configuring the system-defining adequate swap space, defining file mappings on various disk drives, setting block sizes-and undertaking performance tuning. In addition, given that these packages are very large and complex, this person would need to deal with bugs that will inevitably be encountered by developing work- arounds or installing patches. Failure to have expertise available that can take on these kinds of tasks could result in significant down-time and major data problems. Thus, if an enterprise RDBMS is required for a project, a competent database administrator needs to be included in the financial calculations. 14
Select Development Tools
It is necessary to decide on which program development tools to use to create the application. Tools come in four types: programming languages, server-side scripting languages, application servers, and programs that create static pages. The first involves a programmer writing a CGI application using a traditional programming language (such as C/C++, perl, or VisualBasic) to get data in and out of the RDBMS. 15 While this approach provides the most flexibility and power, it has several drawbacks:
- It is extremely expensive in terms of time
- It requires a much greater level of developer expertise
- It involves more lines of code (with a proportionately greater possibility for bugs)
- It scales poorly for multiple or large projects in terms of development costs
Alternately, one can decide to use a server-side scripting language such as mod_php, VBScript, or Active Server Pages (ASP) (see table 2). 16 This approach involves writing code directly into HTML documents rather than as stand-alone programs. While server-side scripting does require some programming expertise, it is relatively minimal. In addition, developers have a number of resources to assist them, including books, function libraries, user's groups, and discussion forums. Also, since it utilizes HTML-based pages, it allows nonsystems staff to be involved in the program development process.
Table 2. Scripting Tools and Application Servers 17
|
Product |
Language |
Servers supported |
Windows |
| ASP |
VBScript |
Microsoft IIS |
Yes |
| Apache::ASP |
perl |
Apache |
Yes |
| ColdFusion |
CFML |
Apache, Microsoft IIS, Netscape |
Yes |
| EmbPerl |
perl |
Apache |
Yes |
| mod_php |
php |
Apache, Microsoft IIS, Netscape |
Yes |
A closely related approach, using an application server such as ColdFusion, also requires programming expertise, but somewhat less than the previous two. Database interactions are carried out through the use of ColdFusion Markup Languages (CFML) tags (which look like HTML tags) embedded in the HTML document. In addition, ColdFusion permits development of new tags if those that come with the program are not adequate.
Finally, one can use tools such as Access2000, GDIdb, and others to take a database and output HTML pages that can then be moved to the Web server. 18 However, with the exception of GDIdb, this approach requires that each page be manually created and published to the Web site. Since this approach requires a pre-extant database, the developer may still need to write data maintenance routines to add and edit data, particularly for multitable applications.
Coding
Once the design process is completed, the necessary hardware, RDBMS, and development software have been installed and configured, and the data models have been implemented in the database, one can proceed to actual code development. This part of the process involves creating the HTML forms; creating code to process the data from those forms to build appropriate SQL queries; connecting to the RDBMS server and sending queries and retrieving results; and formatting query results returned by the database.
While the specifics on how to code vary according to the development tool selected, there are certain tool-independent rules of thumb to make the process easier. These include:
- Make sure that HTML form-variable names are the same as the name of the field into which the data will be placed. This makes programming and debugging the application much easier.
- Break the application down into sections and then program and test each section individually.
- Where the same task is done often, consider creating a function and putting that function into a separate file to be included as needed. This allows this same code to be used in multiple applications, thereby speeding up development of new projects and making it easier to maintain multiple applications.
- Investigate code libraries, script collections, and other Web programming sites as well as checking with colleagues. Often what one is trying to do has been done before and the developer has made it available. Significant amounts of time can be saved by not reinventing the wheel.
- Document the program as it is being developed by writing comments and explanations into the code itself. This is critical to the ongoing support of the application and one that is all-too-often ignored. There are two rules of undocumented code: The first is that no programmer will immediately understand someone else's code. The second is that no programmer will easily understand his or her own code six months later. Without proper code annotations, any programmer who needs to change or fix a section of code may need to relearn the entire program in order to properly fix it, resulting in significant down-time. 19
An important part of the development process is to eliminate bugs (program glitches) before putting the product out for general use. As noted above, part of the planning process is defining the application program flow. The developer should use that information to develop formal testing procedures to verify that the program is working correctly. Then the developer separately tests each transaction with the database, verifying that each data element is being added and updated properly, and that all desired constraints are being observed by the application. By formally defining and performing these testing procedures, the developer will save much time and frustration down the road.
Once the application has gone through these testing procedures, it is ready for end users to try out. While problems may still occur at this point, they should be relatively minor. There are a number of things to remember about the process of end-user testing:
- The developer should work directly with the testers so that, as problems are found, it is possible to see what is happening and, if feasible, fix the problem on the spot
- There should be enough data in the database for users to test all desired functionality
- Searchers should know what to search on to get actual results
- As noted above, this is not the time to change things. Any changes to the system should be put off until a subsequent version
Production
Once the application has been fully tested, it can be put it into production. Note that this is not an all-or-nothing decision. For example, one may create data maintenance routines first and leave other aspects, such as public access modules, for later implementation. In such cases, a copy should be made both of the application code and the database and put into a separate testing area, where they can then be used for further system development. Under no circumstances (aside from fixing bugs) should work be done on the production version. Otherwise, the result will be an unstable (and for the user a very frustrating) application.
Note that, once a complete application is in production, the work does not end. Bugs will turn up, features won't work as expected, fields may need to be added or changed, and new features or enhancements may be desired. One should therefore plan for ongoing application support.
As parts of the program are implemented, developers should provide documentation on how to use the application. While the staff, at least initially, will be fairly knowledgeable due to their experience in the development process, they may leave or forget what they have learned. Having documentation, no matter how easy the application may seem, can save great amounts of time and frustration for new users trying to learn the system.
Issues
Data Backups
Among the ongoing tasks that will need to be done, none are more important than doing regular backups. However, backing up a database, while seemingly a simple matter, can be a huge disaster waiting to happen if not done correctly. If a user submits an interaction that modifies data in the database while the database is being backed up, the result could be a backup containing pre-interaction (unchanged) content in some tables and post-interaction (changed) contents in the others. The result would be a backup that would be unusable in case of an emergency. As Greenspun puts it: "Be afraid. Be very afraid. Standard Unix or Windows NT file system backups will not leave a consistent and therefore restorable database on tape." 20
There are three ways around this problem:
- Take the database offline (or put a read-only lock on it) before the backup is started and then put it back online (or take the read-only lock off) when the backup is completed.
- Put the database into read-only mode and do a complete dump of the data to disk and then take the lock off again. Due to the potential amount of disk space required, this is the least scalable of the three solutions.
- Use specialized backup software that will maintain the integrity of the data while keeping the database up and running 21
Given that most library applications do not require the ability to do data maintenance on a 24/7 basis, it should not be a problem to take the first course. However, it is important to be aware of the issue and to plan accordingly.
Since backups are usually done only once a day, applications should log all interactions that update the database to a journal file. This logging may be done either by the database itself or the developer may code that feature into the application. Not only does having a journal file make trouble-shooting significantly easier, it means that if there is a database crash, the administrator can restore the database from the backup system and then reapply any subsequent transactions from the log files to bring the system back up to date.
Data Integrity
In writing data maintenance routines, the programmers need to ensure data integrity-making sure that, as information is added, modified, and deleted, the information in the database remains reliable. Failure to maintain this integrity can create a data disaster. There are a number of ways in which data integrity can be compromised:
- Two people trying to make changes to the same record at the same time
- Editing routines that do not properly maintain links between related records
- Programs that end without completing the creation or elimination of links between records
One can address this problem in a number of ways:
- Locking-by locking data-affected areas in the database when an interaction is taking place, no other interactions against that data area can be undertaken until the first one has completed. This locking can be done either at the table or the record level, depending on RDBMS capabilities. Code usually needs to be included in the application to invoke these capabilities.
- Transactions-transaction is a technique where the RDBMS waits until all queries have successfully completed, at which point they are committed. If there is a problem, a rollback can be performed to return the database to where it was before the transaction was attempted. Most of the databases mentioned above provide transaction support.
- Foreign key integrity support-In a relational database, two tables are related if records in each table have fields that share a common value. If the value in the common field in one table changes, the records in the other table's common field containing that value must also have the value changed or the relationship will be broken. RDBMS products with foreign-key integrity are able to automatically maintain these relationships without requiring additional programming. If the RDBMS does not provide this feature, the programmer will need to code it into the application. 22
One further data-integrity issue that the developer needs to address is that of validating user-data entry. Even though an application works correctly, users may create data problems by inputting anomalous data into fields, such as entering letters into a number field or mistyping a word. There are a number of approaches that can address this type of problem:
- Rather than typing information in, one can create option lists, checkboxes, or radio buttons from which users select an entry.
- Insert code that checks user input. If they have input incorrect information (or even no information), the program can then warn the user and not complete the interaction until the problem is corrected.
- Automatically enter certain types of information, such as the user's name (from their login information) and the current (system) date, into the database.
Security
As with any Internet-accessible application, security should be a primary concern in database-backed Web applications. Developers address security at three levels: system, database, and browser/server communication. While system security is beyond the scope of this article, the other two are not; both have issues associated with them. These issues center around two core concepts: authentication and authorization. Authentication lets the application know that the person attempting to access it is the person she or he claims to be. Authorization is of verifying that that person is allowed to perform the requested action.
While one probably won't authenticate end users going to the application's public pages, one should do so for those maintaining the database. This authentication can be done in one of three ways:
- Securing the data maintenance directory using the Web server's security mechanism
- Using the RDMBS's security tables
- Using tables within the application database for authorization
The advantage of the first, and simplest, option is that it allows the developer to utilize an already-developed security system. Since the Web server authenticates the user, the developer doesn't need to build that type of functionality into the application. However, it has the disadvantage of not supporting authorization. Once users are authenticated, they all have the same rights within the application. If that is not desirable, another approach will be needed.
Most RDBMS products provide support for user authorization in that their security tables can associate different permissions (such as Select, Update, and Delete) within the RDBMS with different users at the database, table, and even field level. In other words, one may allow certain users to add records to one set of fields while allowing them only searching capability with another set. The drawback of using the RDBMS security tables is that maintaining these tables can be awkward and could lead to security problems if one is dealing with numerous users.
The third option is set up a table that will contain the user's username, password, and access levels(s)- admin, student, supervisor, and so on. A user who starts to work with the application is then presented with a login screen on which to enter this information. The application would then look the information up in the table for authentication. If a match is found, the user is provided with access to the application and the access levels are noted. Then, each time an interaction with the database is attempted, the application compares the user's access levels with those required to perform the interaction. If there is a match, the user is allowed. If there is not a match, the user is disallowed.
For this third option to work, the user's information needs to be available to every part of the application. Since the Web is stateless (each transaction with a Web server is a discrete event with the server usually keeping no information between transactions), some means of passing the user's information between the pages is needed. This can be done in the following ways:
- Put the information into hidden tags that can be passed from page to page and use the HTTP_REFERER environment variable to make sure that information passed is coming from a trusted source. The problem is that it can be very tedious to make sure that each page receives and passes the information on correctly.
- Once the user is authenticated, set a cookie that contains the user information and pass it from page to page. A user attempting to access a page without the proper cookie is taken directly to the authentication form.
- Some development tools (such as php) support the concept of sessions and, within those sessions, the creation of session variables. Once these session variables are set, they can be accessed without having to explicitly pass them between pages (and, if they are not available, the user is forced to authenticate). The one caveat here is to make sure that only the application (and not malicious users) can set these session variables.
The third security area concerns the communication between the user's Web browser and the Web server. To protect the database from unauthorized access, one should explore implementing a secure sockets layer (SSL) on the Web server. 23 SSL causes the browser to send authentication information to the server in an encrypted form. If this is not done, hackers with freely available software could grab this information and use it to make unauthorized changes to the database.
Communication
Finally, one must develop and maintain good lines of communication between both the library administration and staff members involved in the process. This communication is critical to ensuring (and maintaining) the success of the project. Given that there can be delays, it is important that the development team provide regular and frequent updates to both groups on how the project is going and inform them of any problems or delays that arise. That kind of feedback helps to maintain interest and assists in keeping support for the project from evaporating.
Conclusions
Using a database can provide a multitude of benefits that make it well worth the investment of time, energy, and resources. It allows for easier Web-site maintenance, more efficient use of time, and a greater ability to keep data current. However, the output of the system will only be as good as the time, planning, and care that go into the system's creation. By taking the time to do things right rather than quickly, developers ensure that the investments made will be richly repaid many times over. Anticipating challenges and making the right decisions allows one to avoid many problems. Thus, by utilizing the expertise, vision, and energies of the library staff, the development team can create a product that will be useful for years to come and reflect the qualities of the staff who made it happen.
References and Notes
1. Due to the predominance of various flavors of Unix and Windows (as well as the author's expertise), this article will concentrate primarily on those platforms. Note that, in addition to using the Unix tools contained in this article, Macintosh users can use native Mac OS applications to create database-backed Web pages. Accessed Apr. 12, 2002, www.nisto.com/mac/tool/database/html.
2. Philip Greenspun, Philip and Alex's Guide to Web Publishing (San Francisco: Morgan Kaufmann Pub., 1999.) (Also available on the Web at: www.arsdigita.com/books/panda); Luke Welling and Laura Thompson, PHP and MySQL Web Development. (Indianapolis: Sams, 2001); Jim Buyens, Web Database Development Step by Step (Redmond, Wash.: Microsoft Pr., 2001); Micah Brown and Mike Frederick, Essential Cold Fusion 4.5 for Web Professionals (Paramus, N.J.: Prentice Hall, 2001).
3. Kristin Antelman, "Getting Out of the HTML Business: The Database-Driven Web-Site Solution," Information Technology and Libraries 18, no. 4 (Dec., 1999): 17681; William H. Mischo and Mary C. Schlembach, "Web-Based Access to Locally Developed Databases," Library Computing 18, no. 1 (1999): 5158; and Carole A. Kiehl and Edward H. Summers, "Comprehensive Access to Periodicals: A Database Solution," Library Collections, Acquisitions, and Technical Services 24 (spring 2000): 3344; and Stephen Westman, "Database-backed Library Web Pages," The Electronic Library 19, no. 6 (Dec. 2001), 42431 Antelman's article provides a good overview of database-backed Web pages.
4. While this group may include data users, it should also include others who have never seen the system before.
5. Greenspun, p. 313. Greenspun's use of the term "transaction" is potentially confusing, due to the fact that the word has a particular meaning in RDBMS systems. Therefore, this article will use the term "interactions" when referring to this particular concept.
6. Michael J. Hernandez, Database Design for Mere Mortals: A Hands-on Guide to Relational Database Design (Reading, Mass.: Addison-Wesley, 1997).
7. The term SQL is the acronym for "structured query language," which was first described in 1970 by E. F. Codd. The relational database model has quickly become industry standard for databases ranging from Microsoft Access for desktop applications to products such as Oracle which manage the data for huge multinational corporations. Codd's article is available in Michael Stonebreaker's Readings in Database Systems (San Mateo, Calif.: Morgan-Kaufmann Pubs., 1988). A good introduction to SQL by Philip Greenspun can be found at www.arsdigita.com/books/sql, accessed Apr. 12, 2002.
8. DBI (or Database Interfaces) are perl modules that provide a perl-based interface to a wide variety of RDBMS platforms. Accessed Apr. 12, 2002, http://dbi.perl.org/index.html contains a list of modules. Accessed Apr. 12, 2002, www.perl.com/pub/1999/10/DBI.html has more information available; open database connectivity (ODBC) is a standard protocol for accessing different databases residing on different servers and platforms; JDBC (or Java Database Connectivity) is similar to ODBC and is an API developed by Sun Microsystems for use with their Java language. It provides Java programmers a means for integrating database queries to databases residing on the same or other servers into their programs. Accessed Apr. 12, 2002, http://java.sun.com/products/jdbc provides more information.
9. A notable exception being OpenBSD which claims "Three years without a remote hole in the default install." Accessed Apr. 12, 2002, www.openbsd.org.
10. Applications Programming Interface (API)-this is the set of hooks the RDBMS vendor provides to allow interactions to occur between the RDBMS and outside programs.
11. These categories are borrowed from Roy Tennant, "A Database for Every Need," Library Journal 124, no. 20 (Dec. 1999): 41,43.
12. More information on and downloads of these products are available: Accessed Apr. 12, 2002, FrontBase is available from: www.frontbase.com; accessed Apr. 12, 2002, mSQL at www.hughes.com.au; accessed Apr. 12, 2002, MySQL at: www.mysql.com; accessed Apr. 12, 2002, PostgreSQL at www.postgresql.org; accessed Apr. 12, 2002, Filemaker Pro at www.filemaker.com; accessed Apr. 12, 2002, Fourth Dimension at www.acius.com; accessed Apr. 12, 2002, Lasso from www.blueworld.com; accessed Apr. 12, 2002, MS-Access at www.Microsoft.com/office/access/default.htm; accessed Apr. 12, 2002, SQL Server at www.Microsoft.com/sql/default.htm); accessed Apr. 12, 2002, DB2 at www-4.ibm.com/software/data; accessed Apr. 12, 2002, Informix at www-4.ibm.com/software/data; accessed Apr. 12, 2002, Oracle at www.oracle.com; and accessed Apr. 12, 2002, Sybase at www.sybase.com; PostgreSQL, on Windows systems require the use of Cygwin. accessed Apr. 12, 2002, http://sources.redhat.com/cygwin.
13. One source for side-by-side feature comparisons, accessed Apr. 12, 2002, is www.mysql.com/information/crash-me.php. John Ashenfelter, Choosing a Database for Your Web Site (New York: Wiley, 1998) also discusses a variety of Web tools, focusing on Access and Filemaker Pro.
14. A good overview Oracle database administration is provided by Rick Greenwald, Robert Stackowiak, and Jonathan Stern, Oracle Essentials: Oracle 8 and Oracle 8i (Sebastopol, Calif.: O'Reilly and Assoc., 1999).
15. Perl is a very popular scripting language that is used extensively in Web programming, particularly CGI applications. Accessed Apr. 12, 2002, www.perl.org for more information; accessed Apr. 12, 2002, http://msdn.Microsoft.com/vbasic will provide information on Microsoft's VisualBasic.
16. Mod_php is an Apache server module that uses php, an open source scripting language that can be used on multiple platforms and can be integrated into an Apache server. It allows HTML page creators to embed commands that can search a database and then output the results directly into their pages. Accessed Apr. 12, 2002, www.php.net for more information on php. Accessed Apr. 12, 2002, http://hotwired.lycos.com/Webmonkey/databases/tutorials/tutorial4.html for a tutorial on how to include php searches in an HTML page. Note that php4 allows the same type of functionality to be included using Netscape and Microsoft servers using NSAPI and ISAPI respectively. Active Server Pages (ASP) is a Microsoft scripting language that works with their Internet Information Server. Accessed Apr. 12, 2002, http://msdn.microsoft.com/scripting/default.htm?/scripting/vbscript/default.htm for information. ASP, while originally a Microsoft technology using VisualBasic/VBScript, has been extended for use on other platforms, such as the Apache::ASP port, which is multiplatform. Information on the Microsoft product is available at http://msdn.microsoft.com/library/default.asp?url=/nhp/Default.asp?contented=28000522 and at www.nodeworks.com/asp/ for the Apache product, accessed Apr. 12, 2002,
17. Apache requires the use of mod_perl for Apache::ASP. ColdFusion is a Web application server from Allaire that incorporates visual programming, as well as database and debugging tools, available in versions for Solaris, NT, and Linux. It uses its own proprietary language for integrating database support into HTML pages. Accessed Apr. 12, 2002, www.allaire.com/products/coldfusion for information. A free, limited-functionality version (for Linux and NT), ColdFusion Express, is available. CFML (ColdFusion Markup Language) available for Linux and Solaris only.
18. GDIdb is a Windows-based automated application that provides scripting that permits the creation of HTML files from any ODBC-compliant database. The company offers two versions, GDIdb and GDIdbPro, the latter stating that it can interactively receive and publish information in real time. Accessed Apr. 12, 2002, www.gdidb.com for information. See also Ernest Perez, "GDIdb: Getting Your Databases on the Web, with No Big Deal," Library Computing 18, no.1 (1999): 2935. For other approaches, see www.tucows.com/htmlacc95.html, accessed Apr. 12, 2002.
19. While any good programming book should contain information on commenting code, a particularly good example can be found in Tobias Ratschiller and Till Gerken, Web Application Development with PHP4.0 (Indianapolis: New Rider's Pubs., 2000).
21. There are a number of these types of products. In addition to those that come with the enterprise database servers, manufacturers of commercial backup software include packages that support major RDBMS vendors. These include Veritas Software's NetBackup, accessed Apr., 12, 2002, www.veritas.com; Legato, accessed Apr., 12, 2002, www.legato.com; and Sun's Solstice Backup accessed Apr., 12, 2002, www.sun.com/solstice/backup that provide this support.
22. Note that this is an important difference between lower- and higher-priced systems. If the system utilizes table-level locking, this means that nobody else can modify any record within an affected table until the process is completed, whereas row-level locking only affects the rows involved in the process. While this may not be a problem for many library programs, it could pose a problem for applications where there could be a great deal of data maintenance occurring concurrently. See Greenspun, 35253 for caveats on this approach.
23. Secured Sockets Layer-a means by which transactions between users and Web servers can be made secure via certificates, and use encryption between the user and the server to ensure that transaction data are not visible to snoopers on the Web. Accessed Apr., 12, 2002, http://searchSecurity.techtarget.com/SDefinition/0,,sid14_gci343029,00.html for more information. Most commercial Web servers provide this capability. Users of Apache can have two alternatives, both accessed Apr., 12, 2002, www.modssl.org and www.apache-ssl.org.
Stephen Westman ( westman.2@osu.edu) is Digital Resources Systems Administrator for The Ohio State University Libraries, Columbus.