CCP14 Homepage - Single Crystal and Powder Diffraction - CCP14 Web/Config Administration Information

[CCP14 Home: (Frames \| No Frames)] CCP14 Mirrors: [UK] \| [CA] \| [US] \| [AU]	What's New	Introduction	Site Map
Search the CCP14	Download Programs What do you want to do? (lists of software by crystallographic method)	Tutorials	Solutions

(This Webpage Page in No Frames Mode)

Collaborative Computational Project Number 14

for Single Crystal and Powder Diffraction

CCP14

Isearch 1.20 Indexing and Search Engine

The CCP14 Homepage is at http://www.ccp14.ac.uk

[Back to CCP14 Web/Config Main Page] | [SWISH-E Search Engine]

This Software is no longer used on the CCP14 Website - refer to SWISH-E Search Engine

(Before Waffling on (Jan 2001): another newer alternative to Isearch is Open Source SWISH-E Search Engine - which seems to have a lot more flexibility for a CCP14 style of website and is very fast. Can have a web spider or file system based creation of the index. Is simpler to get going compared to Isearch - but again, robustness for very large webs areas is not proven with Swish compared to Isearch/Iindex.)

The Isearch Indexing and Search Engine is free and available for UNIX; plus:

Is a free text search engine.
Can handle a large site - 100's of Megs to Gigs of information.
Recognises a number of data formats, including HTML and text.
Providing you have a c compiler/gcc and c++/g++ installed, is relatively easy to compile, install and setup (no big deals encountered).
Has a nice feature that generates default forms pages.
Compact database format.
Providing you update the Isearch database after you update your web files/site, seems to be quite accurate and good at finding relevant hits.

Also refer:

"How to write comments suitable for automatic software indexing",
P. DiFelice, G. Fonzi,
Journal of Systems and Software, 1998, Vol.42, No.1, pp.17-28

Where to get Isearch

The Isearch website is at http://www.cnidr.org/ir/isearch.html.

The Isearch ftp download area is at ftp://ftp.cnidr.org/pub/software/Isearch/.

Also be wary that there seems to be another distributions and information at:

Freely available
At http://www.cnidr.org/ir/isearch.html
Modern Binaries at: ftp://ftp.awcubed.com/Software/
At ftp://ftp.awcubed.com/Software/
At http://www.etymon.com/pub/software/Isearch/
At http://www.etymon.com/Isearch/
At http://www.fgdc.gov/clearinghouse/clearinghouse.html
The Isite Information System (isite/isearch/iindex)
- Old Overview description
- At http://herkules.oulu.fi/~z/isitehtm.html

Compiling and Installing the Web Server

This assumes you have GCC/G++ compiled. Many GCC installations do not have G++ installed and there are extra libraries that have to be compiled in. Refer to Compiling the GNU GCC/G++ C compiler for information on how to do this. With a decent workstation, this should not be a problem and the ./configure programs should be able to detect make compiling a relatively trivial application.

Extract the Isearch-1.20.06.tar.gz file using the command gzip -d < Isearch-1.20.06.tar.gz | tar xvof -
Change directory into the Isearch-1.20.06 directory.
Run the ./Configure program (it should detect your system and set things up OK)
Type make to compile isearch
- CGI/Web executables are in the Isearch-cgi directory
- General executables are in the bin directory.
Type make install to copy the general executables into the /usr/local/bin. Normally you would have to be the systems administrator to do this unless you have been given permissions to install programs in this area under your user account.
Also put the CGI/Web executables into the cgi-bin area designated by your web server setup. (e.g., for apache 1.3.x6 /usr/local/etc/httpd/cgi-bin)

Deciding What to Index

In the case of the CCP14. All html, htm and relevant text files are indexed for each virtual domain (www, alife, programming, netlib, gnu, etc). For possible regional mirroring purposes, it was decided to keep things separate.

Config Files

For CCP14, because there are a variety of different "virtual domains" with their own search databases, each search database is put in it's own cgi-bin directory.

In this case, for the ccp14web index, the three config files, ifetch, ihtml and ihtml are put in /usr/local/etc/httpd/cgi-bin/ccp14/ as designated by the apache 1.3.x configuration setup. Though the CGI executables are in /usr/local/etc/httpd/cgi-bin/ so that different virtual domains (www, netlib, gnu, programming, alife) use the same executable.

/usr/local/etc/httpd/cgi-bin/ccp14/ifetch config file

#!/bin/sh
# From this script, run the isrch_fetch utility and pass 4 arguments:
#
# isrch_fetch  $1 $2 $3
#
# /path/to/Isearch-cgi/isrch_fetch /path/to/my/databases

exec /usr/local/apache/share/cgi-bin/isrch_fetch /web_disc/ccp14/web_area/isearch/ccp14web $1 $2 $3

/usr/local/apache/share/cgi-bin/ccp14/ihtml config file

#!/bin/sh
# From this script, run the isrch_srch utility and pass a single argument
# that is the directory where your database are stored.
#
# For example:
#
# /path/to/Isearch-cgi/isrch_html /path/to/my/databases

exec /usr/local/apache/share/cgi-bin/isrch_html /web_disc/ccp14/web_area/isearch/ccp14web

/usr/local/apache/share/cgi-bin/ccp14/ihtml config file

#!/bin/sh
# From this script, run the isrch_srch utility and pass a single argument
# that is the directory where your database are stored.
#
# For example:
#
# /path/to/Isearch-cgi/isrch_srch /path/to/my/databases

exec /usr/local/apache/share/cgi-bin/isrch_srch /web_disc/ccp14/web_area/isearch/ccp14web

Daily Auto Indexing-Creation of the Search Database

As automirroring of webpages is implemented between 1am and 5am each morning using WGET, it is necessary that the Iindex database reflects this change after the auto-mirroring session. While an incremental update is feasible using the "-a" option, the Isearch mailing list subscribers recommend just generating the database from scratch which under this cercumstance.

Note: If the cron script does not seem to be working, check that you have either specified the full path for running Iindex or that the path is specified in the default PATH

In the .crontab file (which can then be passed into the crontab using the command crontab .crontab), put the script file that is going to be run after the automirroring. In this case, the script will run each morning at 5.07am.

05 07 * * * ./isearch.index.script

This calls a script file to regenerate the index file using the recommend method (generating a file of all the files to be indexed, then running Iindex on this file), then move it over the old one so as to minimize downtime of the indexing to a fraction of a second. The last lines send an email to ccp14@dl.ac.uk confirming the script has run and the time completed.

#!/bin/csh

# You should CHANGE THE NEXT 3 LINES to suit your local setup
setenv  LOGDIR   ./web_area/mirrorbin/logs    # directory for storing logs
setenv  PROGDIR  ./web_area/mirrorbin         # location of executable
setenv  PUTDIR   ./web_area/web_live/ccp      # relative directory for mirroring
                               # relative due to possible kludge in wget
  #can change to absolute if you wish - some internal links may not work

set DATE=(`date`)
sed "/START_Iindex/s/NOT_FINISHED/Regeneration_STARTED $DATE/" ./report-template.txt  > ./report.txt.new
mv report.txt.new report.txt


rm -rf web_area/isearch/temp
mkdir web_area/isearch/temp
rm -f web_area/isearch/*.txt*
find web_area/web_live/ -name "*.html"      -type f -print > web_area/isearch/tmpfile.txt
find web_area/web_live/ -name  "*.htm"      -type f -print >> web_area/isearch/tmpfile.txt
find web_area/web_live/ -name "*.txt"       -type f -print > web_area/isearch/tmpfile.txt2
find web_area/web_live/ -name "readme.1st"  -type f -print >> web_area/isearch/tmpfile.txt2
find web_area/web_live/ -name "readme.2nd"  -type f -print >> web_area/isearch/tmpfile.txt2
grep -v Ray-Tracing-News web_area/isearch/tmpfile.txt > web_area/isearch/tmpfile.txta
grep -v CCP14-by-OS web_area/isearch/tmpfile.txt2 > web_area/isearch/tmpfile.txt2a
grep -v ccp14-by-program web_area/isearch/tmpfile.txt2a > web_area/isearch/tmpfile.txt2b
/usr/local/bin/Iindex -d web_area/isearch/temp/ccp14web -m 16 -t SGMLTAG -f web_area/isearch/tmpfile.txta >  web_area/isearch/summary.txt
/usr/local/bin/Iindex -d web_area/isearch/temp/ccp14web -m 16 -t SIMPLE -a -f web_area/isearch/tmpfile.txt2b >> web_area/isearch/summary.txt
mv web_area/isearch/ccp14web web_area/isearch/ccp14webold
mv web_area/isearch/temp web_area/isearch/ccp14web
rm -rf web_area/isearch/ccp14webold

#  2>&1    - puts standard err to the file as well.

rm -rf web_area/isearch/temp
mkdir web_area/isearch/temp
rm -f web_area/isearch/*.txt*
find web_area/xrd/web/ -name "*.html"      -type f -print > web_area/isearch/tmpfile.txt 
find web_area/xrd/web/ -name  "*.htm"      -type f -print >> web_area/isearch/tmpfile.txt
grep -v web_stats web_area/isearch/tmpfile.txt > web_area/isearch/tmpfile.txta
/usr/local/bin/Iindex -d web_area/isearch/temp/wwwxrd -m 16 -t SGMLTAG -f web_area/isearch/tmpfile.txta >  web_area/isearch/summary.txt
mv web_area/isearch/wwwxrd web_area/isearch/wwwxrdold
mv web_area/isearch/temp web_area/isearch/wwwxrd
rm -rf web_area/isearch/wwwxrdold

set DATE=(`date`)
sed "/WWWXRD_Iindex/s/NOT_FINISHED/Regeneration_COMPLETED $DATE/" report.txt  > report.txt.new
mv   report.txt.new   report.txt


set DATE=(`date`)
sed "/WWW_Iindex/s/NOT_FINISHED/Regeneration_COMPLETED $DATE/" ./report.txt  > ./report.txt.new
mv report.txt.new report.txt

rm -rf web_area/isearch/temp
mkdir web_area/isearch/temp
rm -f web_area/isearch/*.txt*
find web_area/programming/ -name "*.html"   -type f -print > web_area/isearch/tmpfile.programming.txt
find web_area/programming/ -name  "*.htm"   -type f -print >> web_area/isearch/tmpfile.programming.txt
find web_area/programming/ -name "*.txt"    -type f -print > web_area/isearch/tmpfile2.programming.txt
/usr/local/bin/Iindex -d web_area/isearch/temp/programming -m 15 -t SGMLTAG -f web_area/isearch/tmpfile.programming.txt >  web_area/isearch/summary.txt 
/usr/local/bin/Iindex -d web_area/isearch/temp/programming -m 15 -t SIMPLE -a -f web_area/isearch/tmpfile2.programming.txt >>  web_area/isearch/summary.txt 
mv web_area/isearch/programming web_area/isearch/progwebold
mv web_area/isearch/temp web_area/isearch/programming
rm -rf web_area/isearch/progwebold

set DATE=(`date`)
sed "/PROGRAMMING_Iindex/s/NOT_FINISHED/Regeneration_COMPLETED $DATE/" report.txt  > report.txt.new
mv   report.txt.new   report.txt 

rm -rf web_area/isearch/temp
mkdir web_area/isearch/temp
rm -f web_area/isearch/*.txt*
find web_area/alife/ -name "*.html"   -type f -print > web_area/isearch/tmpfile.alife.txt
find web_area/alife/ -name  "*.htm"   -type f -print >> web_area/isearch/tmpfile.alife.txt
find web_area/alife/ -name "*.txt"    -type f -print > web_area/isearch/tmpfile2.alife.txt
/usr/local/bin/Iindex -d web_area/isearch/temp/alife -m 15 -t SGMLTAG -f web_area/isearch/tmpfile.alife.txt >  web_area/isearch/summary.txt 
/usr/local/bin/Iindex -d web_area/isearch/temp/alife -m 15 -t SIMPLE -a -f web_area/isearch/tmpfile2.alife.txt  >>  web_area/isearch/summary.txt 
mv web_area/isearch/alife web_area/isearch/alifewebold
mv web_area/isearch/temp web_area/isearch/alife
rm -rf web_area/isearch/alifewebold

set DATE=(`date`)
sed "/ALIFE__Iindex/s/NOT_FINISHED/Regeneration_COMPLETED $DATE/" report.txt  > report.txt.new
mv   report.txt.new   report.txt 


rm -rf web_area/isearch/temp
mkdir web_area/isearch/temp
rm -f web_area/isearch/*.txt*
find web_area/netlib/ -name "*.html"   -type f -print >  web_area/isearch/tmpfile.netlib.html.txt
find web_area/netlib/ -name  "*.htm"   -type f -print >> web_area/isearch/tmpfile.netlib.html.txt
find web_area/netlib/ -name "*.txt"    -type f -print >  web_area/isearch/tmpfile.netlib.txt
find web_area/netlib/ -name "readme"   -type f -print >> web_area/isearch/tmpfile.netlib.txt
find web_area/netlib/ -name "*.c"      -type f -print >> web_area/isearch/tmpfile.netlib.txt
find web_area/netlib/ -name "*.src"    -type f -print >> web_area/isearch/tmpfile.netlib.txt
find web_area/netlib/ -name "*.f"      -type f -print >> web_area/isearch/tmpfile.netlib.txt
find web_area/netlib/ -name "manual"   -type f -print >> web_area/isearch/tmpfile.netlib.txt
find web_area/netlib/ -name "manlc"    -type f -print >> web_area/isearch/tmpfile.netlib.txt
find web_area/netlib/ -name "helplc"   -type f -print >> web_area/isearch/tmpfile.netlib.txt
find web_area/netlib/ -name "imsl"     -type f -print >> web_area/isearch/tmpfile.netlib.txt
find web_area/netlib/ -name "nag"      -type f -print >> web_area/isearch/tmpfile.netlib.txt
find web_area/netlib/ -name "port"     -type f -print >> web_area/isearch/tmpfile.netlib.txt
find web_area/netlib/ -name "siam"     -type f -print >> web_area/isearch/tmpfile.netlib.txt
find web_area/netlib/ -name "index"    -type f -print >> web_area/isearch/tmpfile.netlib.txt
find web_area/netlib/ -name "doc"      -type f -print >> web_area/isearch/tmpfile.netlib.txt
find web_area/netlib/ -name "source"   -type f -print >> web_area/isearch/tmpfile.netlib.txt
find web_area/netlib/ -name "*.text"   -type f -print >> web_area/isearch/tmpfile.netlib.txt
/usr/local/bin/Iindex -d web_area/isearch/temp/netlib -m 25 -t SGMLTAG -f    web_area/isearch/tmpfile.netlib.html.txt >  web_area/isearch/summary.txt 
/usr/local/bin/Iindex -d web_area/isearch/temp/netlib -m 25 -t SIMPLE -a -f  web_area/isearch/tmpfile.netlib.txt >>  web_area/isearch/summary.txt 
mv web_area/isearch/netlib web_area/isearch/netlibwebold
mv web_area/isearch/temp web_area/isearch/netlib
rm -rf web_area/isearch/netlibwebold

set DATE=(`date`)
sed "/NETLIB_Iindex/s/NOT_FINISHED/Regeneration_COMPLETED $DATE/" report.txt  > report.txt.new
mv   report.txt.new   report.txt

rm -rf web_area/isearch/temp
mkdir web_area/isearch/temp
rm -f web_area/isearch/*.txt*
find web_area/xrd/web/ -name "*.html"      -type f -print > web_area/isearch/tmpfile.txt 
find web_area/xrd/web/ -name  "*.htm"      -type f -print >> web_area/isearch/tmpfile.txt
/usr/local/bin/Iindex -d web_area/isearch/temp/wwwxrd -m 16 -t SGMLTAG -f web_area/isearch/tmpfile.txt >  web_area/isearch/summary.txt
mv web_area/isearch/wwwxrd web_area/isearch/wwwxrdold
mv web_area/isearch/temp web_area/isearch/wwwxrd
rm -rf web_area/isearch/wwwxrdold




/usr/sbin/Mail -s "Isite_Isearch_Creation_Results `date`" ccp14@ccp14.ac.uk < ./report.txt

Creating Default HTML Forms Files

Iindex/Isearch provides a program to generate default forms and is described in the README file in the Isearch-cgi directory. These files will then call Isearch passing the correct parameters to the search program.

Operation of Isearch-cgi
------------------------
1)  Create access points to databases

        Create a base HTML file with the program search_form.  It takes
        two arguments: the path to your databases, and the name of the
        database this new page should access.  The page is printed to
        standard output, so you may redirect it to a file if you like.
                search_form /home/databases TEST > form.html
        There is another, optional argument that indicates to
        search_form which type of search page you wish to generate.  The
        form types are:
                -simple
                -boolean         
                -advanced
                -html
        If no type is given to search_form, it will default to -simple
        Examples:
                search_form -simple /home/databases TEST > form.html
                search_form -boolean /home/databases TEST > boolean.html
                search_form -advanced /home/databases TEST > advanced.html
                search_form -html /home/databases TEST > htmlform.html

For example, to generate this for the CCP14 crystallographic Iindex database, you would use the command lines:

search_form -simple /web_disc/ccp14/web_area/isearch/ccp14web ccp14web > form.html
search_form -boolean /web_disc/ccp14/web_area/isearch/ccp14web ccp14web > boolean.html
search_form -advanced /web_disc/ccp14/web_area/isearch/ccp14web ccp14web > advanced.html
search_form -html /web_disc/ccp14/web_area/isearch/ccp14web ccp14web > htmlform.html

Then edit the resulting html file to get it in the form you like. In the case of the CCP14 Crystallographic search page, only the boolean and advanced search have been used. Full Text; TITLE, HEAD and ADDRESS are searchable fields with "Full Text" being the default. With TITLE, HEAD, ADDRESS being the result display options and TITLE being the default. AND, OR, NOT and NEAR being menu selected options to relate keywords with AND being the default.

Click here to check out the CCP14 Crystallographic Search Form

Isearch Documentation

Iindex/Isearch

Isearch-CGI Setup for the web
(There is also a Word Document that goes into the setup of the Web Interface for Isearch)

readme.txt

[CCP14 Home: (Frames \| No Frames)] CCP14 Mirrors: [UK] \| [CA] \| [US] \| [AU]	What's New	Introduction	Site Map
Search the CCP14	Download Programs What do you want to do? (lists of software by crystallographic method)	Tutorials	Solutions

(This Webpage Page in No Frames Mode)

If you have any queries or comments, please feel free to contact the CCP14