<<< FAQ
Lost Registration Key >>>
Access - access to a document, access to a page, access to a site, access to a file
Browser - a client application designed to view and interact with various Internet resources. (e.g., Internet Explorer, Netscape Navigator, Opera, etc.)
Client - a software program designed to contact and obtain data from server software on a remote computer. A web browser is a specific type of Client.
Crawling - the process when a spider 'scans' the Internet following links from one web page to another, and thus traversing an entire web site.
Default page / home page/ index page - a web page opened in a browser at start-up. Another, more common meaning is "the main page of a website".
Directive - A service word that spider accepts as a command when reading the instructions file.
Domain Name - the unique name that identifies the address of an Internet site.
Domains - the hierarchical scheme for indicating logical and sometimes geographical affiliation of a web page on the Net.
Download - transferring data from remote computer to your local computer. The opposite of uploading.
Filename - the name of a file. It is composed of an identifiable alphanumerical component and a file extension.
Filename extension - a tag of usually three characters that describes the format of the file so that programs would be able to recognize and open it. ('.doc' is a document file).
Firewall - a complex of security measures designed to protect a local PC or network from unauthorized or unwanted access.
FTP - File Transfer Protocol - a very common and fast method of transferring files between two Internet sites.
FTP server - A server where data can be downloaded/uploaded via FTP.
Hub - a web page that contains many links to other pages. When a spider needs to crawl a Web site, it starts from hubs. Usually the only hub of a given web site is its home page. Following links from the hub, the spider locates all other pages of the site.
HTML format - Hypertext Markup Language - a language designed for Internet documents. HTML uses a set of tags that indicate where and how the information should be displayed in a browser or other program. Tags are used to identify chapters, paragraphs, numbered or bulleted lists, images, tables, headers, etc. The text may be formatted using different fonts, sizes, colors, styles, etc. HTML - is a subset of SGML language. It uses pure ASCII text, most recent version is 4.01.
HTML tag A special symbol (or group of symbols) that marks a certain part of a document or indicates a certain text formatting method. Each tag begins with "<" symbol and ends with ">". In HTML tags are mostly used in pairs. A closing tag duplicates the opening one, but it has a slash after "<" symbol.
HTTP protocol - Hypertext Transfer Protocol - this is a protocol for transferring hypertext from web server to web browser. The most comon port is 80. This is the main WWW protocol that allows HTML documents to be transferred easily from one node to another.
Internet - the vast collection of inter-connected networks linked via TCP/IP protocols and connecting millions of independent networks into the global web.
Internet traffic - in website terms, this is a number of requests for files and documents. Most frequently traffic is measured in hits, accesses, visits in a given time period. The total number and volume of spider visits must also be accounted for when estimating the bandwidth required for effective functioning of the site.
IP Number - Internet Protocol Number - a unique number consisting of 4 parts separated by dots.
LAN -Local Area Network - a network of computers within a particular location.
Link - generally any form of a hypertext link. There are links within the web page, within the site, and external links to other web sites.
Log file / logfile - as far as websites are concerned, this term refers to files where web servers record all requests for data in chronological order.
Log file / logfile analysis - this software allows you to analyse log files to determine where visitors come from, how often they return, and how they navigate through the site.
Log file database - (Robots.txt Editor) - a database of log files for a given project.
Login - an account name used for identification of user and obtaining access to a system or resource.
Log profile - (Robots.txt Editor) - a template that indicates the order of fields in a log file.
Project, project file - a coded file in which all temporary settings are stored in between the working sessions of the program. Temporary settings include:
- URL of the site, for which the instructions file is being created.
- A list of selected spiders and a set of restrictions for them.
- Address of FTP server, port number, login, password, and path to the folder on FTP server, or path to a local folder, where the contents of the site, for which the instructions file is being created, are stored.
- A short description of the project (optional).
Proxy Server - an intermediary server between the client and the Internet. A client may be configured to use the proxy server; usually this is an HTTP server. It transmits client requests to remote servers and then passes the results back to the client. Proxy servers can cache the results and use the cached data instead of reloading it from remote servers.
Password - a code (a group of characters) used for authentication in a secured system.
Re-Direct - a request for a specific URL is sometimes re-directed to another page by the web server.
Registration key - a special set of characters that allows you to register your copy of the program at developer's web site and thus remove all functionality limitations.
Remote - not residing locally. A remote resource is located on a computer somewhere out on the Net.
Robot / spider / metacrawler - a program that automatically traverses the Web hypertext structure by retrieving and scanning a document, and recursively retrieving all documents linked to it. When visiting a website they often look for the robots.txt file in the root directory for instructions on specific areas of the site.
Robot Exclusion Protocol - the standard used for robot exclusion files. It defines the syntax and location of ROBOTS.TXT file and the way web robots will parse that file. Each robot has a user-agent as a handle, and it must follow all directives under the section for this user-agent. If there are no directives specific to its user-agent, then the robot is to follow all directives under the universal user-agent (denoted by an asterisk). Also, the robot exclusion file must be called ROBOTS.TXT and located in the root folder of your server.
Robot text file - a text file named 'robots.txt' that is located in the default directory of the web site and contains directives for Internet spiders that would visit this site.
ROBOTS.TXT - a file name utilized by the robot exclusion protocol. Web robots read this file from the server's root directory and parse it for instructions regarding pages/areas to be indexed and pages/areas to be skipped. The case of the file name does not matter, but the file must be placed in the root folder.
Uploading - transfering data from your local computer to a remote computer. The opposite of download.
User agent - a technical name for programs that allow users to perform various networking operations. The examples of Web User-agents are Netscape Navigator and Microsoft Internet Explorer, and widely known Email User-agents are Qualcomm Eudora, Microsoft Outlook, etc.
URL - Uniform Resource Locator - an address used to locate files using HTTP and some other protocols such as telnet. Particularly pertinent to the Internet. The address is usually made up of the domain Name or IP address, the file or document name, and the path to this document on the server, e.g., www.domainname/path/filename.
<<< FAQ
Lost Registration Key >>>
|