Indexing and Hashing

|If data is indexed it means that there is a one another file or source which stores information where each record or records with the same criteria are stored. Without index whole data scan will be required and if there are thousands of records it will take time to get a record which is required. If the request has some parameters which will need to compare some fields of stored records it will drastically increase search time. Creating new indexes will increase index file size and during time it can become very large so index retrieval process can slow records reading time, also any data modification will trigger changes in index file. Also during time index information can be fragmented so there will be need in index rebuild or reorganization depending on fragmentation percentage.

In a hashed file, data is distributed and divided into blokes, which are called buckets, and buckets are the data container for records. Buckets are created with special algorithm. A record can be easily found by applying reverse algorithm to get the bucket which contains the record. Each record has a key which identifies the place of the record. Collision and clustering can occur in hashed files. Collision occurs when different records are hashed with the same value. Cluster occurs when buckets can result in more search keys ending up in one bucket than others. Hashing is mostly used for authentication information storing, like passwords.


Popular posts from this blog

How to poll database using WCF-SQL adapter

XML debatch in pipeline

Debatching records from WCF-SQL using LOOP and XPath