In Data Structures and Algorithms (DSA), Trees
Trees are a fundamental data structure that help in efficiently organizing and managing hierarchical data. They are used in many applications like databases, file systems, search engines, and network routing, among others.
How Trees Help in DSA:
- Efficient Searching: Binary search trees (BST) allow fast searching, insertion, and deletion of data in O(log n) time (on average), making it much faster than linear search on an array.
- Hierarchical Data Representation: Trees help in representing hierarchical data, like file systems or organizational charts, where each node represents a unit, and the edges represent the relationship between them.
- Balanced Trees: Variants like AVL trees or Red-Black trees maintain balance, ensuring efficient performance in searching and updates.
- Priority Queue: Heaps (a type of binary tree) are used in priority queues, where elements are inserted or removed based on priority rather than order of arrival.
- Tree Traversal: Traversal algorithms like in-order, pre-order, and post-order are widely used in many scenarios like expression evaluation, syntax parsing, etc.
Example: Binary Search Tree (BST) Implementation
Let’s see an example of a Binary Search Tree (BST) in Python, and how it helps in organizing and searching data efficiently.
class Node:
def __init__(self, key):
self.left = None
self.right = None
self.value = key
class BinarySearchTree:
def __init__(self):
self.root = None
def insert(self, root, key):
if root is None:
return Node(key)
if key < root.value:
root.left = self.insert(root.left, key)
else:
root.right = self.insert(root.right, key)
return root
def search(self, root, key):
if root is None or root.value == key:
return root
if root.value < key:
return self.search(root.right, key)
return self.search(root.left, key)
def inorder(self, root):
return self.inorder(root.left) + [root.value] + self.inorder(root.right) if root else []
# Example Usage
bst = BinarySearchTree()
root = None
# Inserting nodes into the BST
keys = [20, 8, 22, 4, 12, 10, 14]
for key in keys:
root = bst.insert(root, key)
# Searching for a key
search_key = 10
found_node = bst.search(root, search_key)
if found_node:
print(f"Node {search_key} found in the tree.")
else:
print(f"Node {search_key} not found in the tree.")
# In-order Traversal (Sorted order for BST)
print("In-order Traversal:", bst.inorder(root))
Explanation:
- Node Class: Each node contains a
value, a reference to theleftchild, and a reference to therightchild. - insert() Method: This method inserts a new node into the tree by comparing the value to be inserted with the root. If the value is less than the current node, it goes to the left; if it's greater, it goes to the right.
- search() Method: This method searches for a given key in the BST. It compares the key with the root and recursively searches in the left or right subtree depending on whether the key is smaller or larger than the current node's value.
- inorder() Method: This method returns the in-order traversal of the tree (
left-root-right). For a BST, this traversal gives the elements in sorted order.
Output:
Node 10 found in the tree.
In-order Traversal: [4, 8, 10, 12, 14, 20, 22]
Real-World Applications of Trees:
- File System: The hierarchy of directories and files is often represented using trees, where each folder is a node and its subfolders or files are child nodes.
- Search Engines: Binary search trees, B-trees, or tries are used to efficiently store and search a large number of documents or URLs.
- Network Routing: Routing tables are often represented as trees or graphs, with nodes representing routers and edges representing paths between them.
- Expression Parsing: In compilers, abstract syntax trees (ASTs) are used to represent the structure of expressions and statements in source code.
Variants of Trees:
- Binary Search Tree (BST): Allows efficient search and insertion, but it can become unbalanced.
- AVL Tree: A balanced BST, ensures O(log n) time for search, insertion, and deletion.
- Red-Black Tree: Another self-balancing binary search tree.
- Heap: A specialized tree-based structure that satisfies the heap property and is used in priority queues.
Keyword Search in Paragraphs Using Binary Search Tree (BST)
Certainly! This example adapts a Binary Search Tree (BST) to search for keywords within paragraphs and indicate their positions in a text file. Characters are not converted into numbers or tokens—instead, each paragraph and keyword is treated as a plain string.
Approach
- Create a Binary Search Tree (BST)
- Each node stores a keyword
- The BST allows fast keyword lookup
- Search for Keywords in Paragraphs
- Each keyword is searched within each paragraph
- All matching positions are recorded
- Indicate Keyword Positions
- If a keyword is found, its index positions are displayed
- Write Results to a File
- Results are saved in a text file for later review
Example Python Code
class Node:
def __init__(self, key):
self.left = None
self.right = None
self.value = key
class BinarySearchTree:
def __init__(self):
self.root = None
def insert(self, root, key):
if root is None:
return Node(key)
if key < root.value:
root.left = self.insert(root.left, key)
else:
root.right = self.insert(root.right, key)
return root
def search(self, root, key):
if root is None or root.value == key:
return root
if root.value < key:
return self.search(root.right, key)
return self.search(root.left, key)
def search_keyword_in_paragraph(paragraph, keyword):
positions = []
start = 0
while start < len(paragraph):
start = paragraph.find(keyword, start)
if start == -1:
break
positions.append(start)
start += len(keyword)
return positions
def write_to_file(file_name, results):
with open(file_name, 'w') as file:
for result in results:
file.write(result + '\n')
def main():
paragraphs = [
"This is the first paragraph. It contains some words.",
"The second paragraph has some different words.",
"Here is the third one. Keywords are fun to search.",
"The fourth paragraph also has the word search."
]
keywords = ["search", "words", "fun"]
bst = BinarySearchTree()
root = None
for keyword in keywords:
root = bst.insert(root, keyword)
results = []
for i, paragraph in enumerate(paragraphs):
result = f"Paragraph {i+1}: {paragraph}\n"
for keyword in keywords:
if bst.search(root, keyword):
positions = search_keyword_in_paragraph(paragraph, keyword)
if positions:
result += f"Keyword '{keyword}' found at positions: {positions}\n"
else:
result += f"Keyword '{keyword}' not found in this paragraph.\n"
results.append(result)
write_to_file("keyword_search_results.txt", results)
if __name__ == "__main__":
main()
Explanation of the Code
- Node Class: Stores a keyword and references to left and right child nodes
- BinarySearchTree Class: Handles insertion and searching of keywords
- search_keyword_in_paragraph(): Finds all positions of a keyword in a paragraph
- write_to_file(): Saves results to a text file
- main(): Controls the program flow and ties everything together
Example Input
Paragraph 1: This is the first paragraph. It contains some words.
Paragraph 2: The second paragraph has some different words.
Paragraph 3: Here is the third one. Keywords are fun to search.
Paragraph 4: The fourth paragraph also has the word search.
Example Keywords
searchwordsfun
Example Output (keyword_search_results.txt)
Paragraph 1: This is the first paragraph. It contains some words.
Keyword 'search' not found in this paragraph.
Keyword 'words' found at positions: [41]
Keyword 'fun' not found in this paragraph.
Paragraph 2: The second paragraph has some different words.
Keyword 'search' not found in this paragraph.
Keyword 'words' found at positions: [47]
Keyword 'fun' not found in this paragraph.
Paragraph 3: Here is the third one. Keywords are fun to search.
Keyword 'search' found at positions: [56]
Keyword 'words' not found in this paragraph.
Keyword 'fun' found at positions: [35]
Paragraph 4: The fourth paragraph also has the word search.
Keyword 'search' found at positions: [50]
Keyword 'words' not found in this paragraph.
Keyword 'fun' not found in this paragraph.
Summary
- BST enables fast keyword lookup
- String-based searching preserves original text structure
- Keyword positions are clearly tracked and stored
- Useful for text analysis, document scanning, and indexing