Codepath

Validating HTML Tags

Unit 4 Session 2 Standard (Click for link to problem statements)

U-nderstand

Understand what the interviewer is asking for by using test cases and questions about the problem.

  • Q: What is the goal of the problem?
    • A: The goal is to determine if a string of HTML-like tags is properly nested and closed.
  • Q: What are the inputs?
    • A: The input is a string containing HTML-like tags (e.g., <div>, </div>).
  • Q: What are the outputs?
    • A: The output is a boolean value: True if the tags are properly nested and closed, False otherwise.
  • Q: What assumptions are made about the tags?
    • A: Tags are well-formed and can be nested but not improperly overlapped (e.g., <div><p></div></p> is invalid).
  • Q: How should nested tags be handled?
    • A: Tags should be checked to ensure they are nested properly with corresponding closing tags in the correct order.

P-lan

Plan the solution with appropriate visualizations and pseudocode.

General Idea: Use a stack to track opening tags. As you iterate through the string, push opening tags onto the stack. For closing tags, check if they match the tag on top of the stack. If they match, pop the stack; otherwise, return False. At the end, if the stack is empty, return True; otherwise, return False.

1) Initialize an empty stack to keep track of opening tags.
2) Iterate through the `html` string using a loop.
   a) When an opening `<` is found, identify the tag.
   b) If the tag is an opening tag (does not start with `/`), push it onto the stack.
   c) If the tag is a closing tag (starts with `/`), check if the stack is not empty and if the top of the stack matches the corresponding opening tag.
   d) If it matches, pop the stack; otherwise, return `False`.
3) After the loop, if the stack is empty, return `True`; otherwise, return `False`.

**⚠️ Common Mistakes**

- Forgetting to check for mismatched or improperly nested tags.
- Not correctly handling cases where the stack is empty when a closing tag is encountered.
- Assuming that the input string is always valid without accounting for edge cases.

I-mplement

def validate_html_tags(html):
    stack = []
    i = 0

    while i < len(html):
        if html[i] == '<':
            j = i + 1
            while j < len(html) and html[j] != '>':
                j += 1
            tag = html[i+1:j]
            if not tag.startswith('/'):
                # It's an opening tag, push onto stack
                stack.append(tag)
            else:
                # It's a closing tag, pop from stack and check
                if not stack or stack[-1] != tag[1:]:
                    return False
                stack.pop()
            i = j
        i += 1

    # If stack is empty, all tags were properly closed
    return len(stack) == 0
Example Usage:

html = "<div><p></p></div>"
print(validate_html_tags(html))  # Output: True

html_2 = "<div><p></div></p>"
print(validate_html_tags(html_2))  # Output: False

html_3 = "<div><p><a></a></p></div>"
print(validate_html_tags(html_3))  # Output: True

html_4 = "<div><p></a></p></div>"
print(validate_html_tags(html_4))  # Output: False
Fork me on GitHub