ADVERTISING on Timetable World

Timetable World costs money to run and to be expanded, mainly hosting expenses and book-scanning services. We will run relevant advertising to help defray the costs, and seek sponsorship and other donations.

HOW TIMETABLE WORLD IS PRODUCED

The Main Elements

This section describes the approach taken to developing Timetable World and the technologies used. In general, we use open source software but out-source most image capture to a specialist book-scanning company and use a commercial web-hosting service. Users are encouraged to comment and suggest better alternatives.

In summary, the main elements are:

  • Capturing electronic images, including aligning and removing book curvature;
  • Image post-processing, such as compression and combination into strips, marking-up hotspots on maps;
  • Optical character recognition of selected pages;
  • Index preparation and database construction;
  • Web-page development. This covers the textual elements and the database retrieval and rendering functionality;
  • Building links and establish connectivity with other online services.

Image Capture

We use a specialist book-scanning service because they have the expensive equipment required to handle high-resolution scans quickly and economically, without damaging the structure of the book. Additionally, they can handle large pull-out maps in a single pass. The images are straightened and the natural curvature of the each page is compensated for. To do these steps on a home-use flatbed scanner would be very time-consuming and still not achieve the same quality of impression.

We choose to capture in JPEG format. As you'll see, there is a lot of post-processing performed by Timetable World, and JPEG is easily manipulated into other target files and formats. Resolution at 400x400 dpi seems to be optimum. The original documents can be slightly blurred and printed on poorer-quality paper, resulting in more noise at higher resolutions that confuses the OCR processes.

Image Post-Processing

Images are compressed using a sophisticated open-source graphics package called Image Magick. This works well for handling large numbers of files in batches. Individual pages are joined into strips of up to 50 pages, each strip being a logical "chapter" covering a specific timetable, railroad or section. By preparing the images like this, we reduce the processing required at query time.

Another open-source graphics package, called The GIMP, is used when special handling is required for individual images, such as cutting out sections. MapEdit is a low-cost commercial product that makes adding hotspots to images, usually maps, very straightforward.

Optical Character Recognition

OCR is applied to index pages, in order to help prepare the Timetable World index. For this we use a commercial package, OmniPage. OCR captures lists of stations, railroads and the like, but is not as good as a human reader. Noise arising from blurry printing and from old-fashioned and sans-serif fonts causes many errors, and the symbols widely used in timetables are not recognised. So we have to use programmatic algorithms developed by Timetable World and manual eye-balling to tidy-up the raw OCR data. At this stage, to analyse the elements of individual timetables, such as the distance charts, the footnotes and the services themselves, seems too ambitious!

Index Preparation and Database Construction

Timetable World relies on a database to index the images. We use open-source database MySQL, which is provided as standard in most web-hosting packages.

In addition to the online database, there are a number of offline databases involved in index preparation. We make extensive use of Microsoft Excel and the open-source scripting language PHP during index preparation, and find that Microsoft Access works well as an ODBC client to MySQL for bulk-loading data.

Web-page Development

HTML and CSS are the framework for any web-site. Additionally, Timetable World uses PHP, a scripting language, to interact with the underlying MySQL database in response to end-user requests.

Google Maps Integration

A published API enables Google Maps to be embedded within free-to-use websites. The API uses Javascript and requires Timetable World to use a unique access key. PHP is used to write the Javascript dynamically in response to user requests.