A trie is a N-ary tree data structure for storing strings in order to support fast string search. Tries can insert or search the string in O(L) time where L is length of string. Trie is mainly used in cases where dictionary kind of data structure needs to be maintained.
Assuming the dictionary contains all the strings were made of all small alphabets “a” to “z” then each node of the trie will have 26 pointers, one for each character. Let’s have a look into simple trie data structure definition.
#define LETTER_LENGTH 26
class Trie
{
public:
Trie* m_ptr[LETTER_LENGTH];
bool is_word_complete;
Trie ();
};
Trie::Trie ()
{
for (int i = 0; i < LETTER_LENGTH; i++)
{
m_ptr[i] = NULL;
}
}
Properties
Each trie node contains
- 26 Trie nodes pointer which will be use for each character used in our string inputs.
- In case the trie node represents an end of word then “is_word_complete” will be set to true to represent the same.
Let’s look into the below diagram to understand how Trie looks.
Insert Operation
To insert a string in Trie, we need to start from the root node and traverse through the tree according to the string characters. Let’s define an algorithm to insert a string in trie tree.
Algorithm
- Start with Trie root node and iterate through string characters
- For every character in string.
- Calculate the character index position (for eg: int index = key[i] – ‘a’)
- If Trie node doesn’t have any node at the index location then allocate a new node and point current Trie node’s pointer to newly created node.
- Set next node to the node pointed by the pointer present at index position.
- In case string is completed then set the “is_word_complete” to true.
Let’s look into below diagram to explain how strings are added into tries.
Let’s look into the function to insert an entry into Trie tree.
void insert (Trie* root, char* key)
{
Trie* temp = root;
for (int i = 0; i < strlen (key); i++)
{
int index = key[i] - 'a';
if (temp->m_ptr[index]) {
temp = temp->m_ptr[index];
} else {
Trie* next = new Trie();
temp->m_ptr[index] = next;
temp = next;
}
}
temp->is_word_complete = true;
}
Search Opeartion
To search a string in Trie, we need to start from the root and follow the characters path from the string to be searched. In case all paths are present and last node has “is_word_complete” variable true then word is matched else string is not present in Trie. Let’s define an algorithm to search a string in Trie tree.
Algorithm
- Start with Trie root node and iterate through string characters.
- For every character in string.
- Calculate the character index position (for eg: int index = key[i] – ‘a’)
- If trie node is present at the index location then check for next character and set trie node to this node present at index.
- Else string is not in Trie. Return false.
- If all characters are visited and are present in Trie and last node of trie has “is_word_complete” set to true then string is matched. Return True.
Let’s look into below diagram to explain how string search works in Trie.
Let’s look into the sample search function code to search a string in Trie data structure.
bool search (Trie* root, const char* key)
{
Trie* temp = root;
for (int i = 0; i < strlen (key); i++) {
if (!temp->m_ptr[(key[i] - 'a')]) {
return false;
}
temp = temp->m_ptr[(key[i] - 'a')];
}
return temp->is_word_complete;
}
Let’s Look into a sample main function to understand the usage of above functions.
int main ()
{
char keys[][20] = {"this", "is", "sparta", "these", "thief", "there"};
Trie* root = new Trie ();
int entries = sizeof (keys)/ sizeof (keys[0]);
for (int i = 0; i < entries; i++) {
insert (root, keys[i]);
}
cout << "Search key: this, found " << search (root, "this") << endl;
cout << "Search key: is, found " << search (root, "is") << endl;
cout << "Search key: madness, found " << search (root, "madness") << endl;
cout << "Search key: sparta, found " << search (root, "sparta") << endl;
}
Let’s check the output of above main function.
Search key: this, found 1
Search key: is, found 1
Search key: madness, found 0
Search key: sparta, found 1
Disadvantages
As we can see from the above Trie Class definition, it requires lot of memories to store the strings. There are 26 pointers per node which are maintained in trie, most of which are most likely be set to NULL. There are ways to improve the space utilization which we will talk in further articles.