You’ve probably seen the meme where a guy opens all the popular AI chatbots in different browser tabs, gives them the same coding prompt, checks the output of each, and then copies the best one. For a moment, I thought I’d do the same experiment. So, I chose three of the most popular AIs and gave them the same problem to solve. Here’s how each one performed.
Choosing a suitable problem and judging criteria
Not too easy, not too challenging, getting that sweet middle spot
For this test, I was brainstorming what kind of coding challenge I could pick. I didn’t want to choose something like “find duplicates from a list” because most AIs would approach it the same way. I also didn’t want to try a LeetCode-style problem because that would go over the heads of many.
Besides, I also wanted to test with something relatable to most people, has some sort of real-world functionality, and has no definite answer. In particular, having no definite answer makes it more interesting since I could observe the thinking and creativity level of each AI.
With that, the problem I finally decided on was “password strength checker.” You usually see these in action when you sign up for a service. Upon creating a password, the system often shows you how strong your password is and what you need to do to make it stronger.
I can use this program to test how well each AI designs the solutions, their coding style, if they follow best practices, user-friendliness, and if the solutions are industry-standard. The scores will be based on these criteria.
Designing the prompt
Observing how much guidance each chatbot needs to reach its best solution
Matt Klein / DALL-E 3 / How-To Geek
With the challenge problem taken care of, it’s now time to think of the prompting techniques. To keep things fair, I will give ChatGPT, Gemini, and Claude the same first prompt. Their first output will be the biggest deciding factor for scoring. After the first prompt, I may give a few follow-up prompts, such as explaining the solution, handling edge cases, refactoring the code, etc., depending on their output.
Here’s the first prompt:
Write a Python function that checks the strength of a password and labels it as Weak, Medium, or Strong.
Explain your logic clearly for a beginner.
I kept it fairly short and simple. This is to see how much the AI tool can think on its own without me giving any explicit instructions.
ChatGPT
Clean code, average explanation
For the challenge, I used the regular free web version of ChatGPT. The model in use was GPT 5.2. Here’s the solution I received:
import re
def check_password_strength(password):
length = len(password)
# Conditions
has_lower = re.search(r"[a-z]", password)
has_upper = re.search(r"[A-Z]", password)
has_digit = re.search(r"[0-9]", password)
has_special = re.search(r"[!@#$%^&*(),.?\":{}|<>]", password)
score = 0
# Scoring rules
if length >= 8:
score += 1
if has_lower:
score += 1
if has_upper:
score += 1
if has_digit:
score += 1
if has_special:
score += 1
# Strength levels
if score <= 2:
return "Weak"
elif score == 3 or score == 4:
return "Medium"
else:
return "Strong"
ChatGPT used Python’s re module for character searching, which is interesting. For a beginner level, though, I expected it to use general Python logic instead of regular expressions. ChatGPT did ask at the end of its response whether I wanted a version using basic Python instead of regex. Otherwise, the code looks really clean and understandable, even for a beginner.
The comments added to the code are basic, not explaining what it’s doing. Just describing the section. The explanation ChatGPT provided after the code section was okay. It went section by section, explaining what each part was doing. But it wasn’t top-notch.
Lastly, the solution was kept short by providing the Python function I asked for. It’s not a full-fledged program that asks for input, gives feedback on the password (like a missing number), or even prints any message, making it a bit less user-friendly and far from production-grade code. The only other thing ChatGPT provided was some test lines:
print(check_password_strength("abc")) # Weak
print(check_password_strength("abc12345")) # Medium
print(check_password_strength("Abc@12345")) # Strong
Gemini
Weak system, beginner-friendly code and explanation
For Gemini, I went with the free web version with the Gemini 3 Fast model. Here’s Gemini’s solution:
def check_password_strength(password):
# Initialize our score
score = 0
length = len(password)
# 1. Check Length
if length >= 8:
score += 1
# 2. Check for Uppercase letters
has_upper = any(char.isupper() for char in password)
if has_upper:
score += 1
# 3. Check for Numbers
has_digit = any(char.isdigit() for char in password)
if has_digit:
score += 1
# Final Labeling Logic
if length < 5:
return "Very Weak (Too short!)"
if score == 3 and length >= 10:
return "Strong"
elif score >= 2:
return "Medium"
else:
return "Weak"
# Testing the function
print(f"'12345': {check_password_strength('12345')}")
print(f"'Password123': {check_password_strength('Password123')}")
print(f"'qwerty': {check_password_strength('qwerty')}")
Gemini went for a Pythonic approach instead of regex, using generator expressions, built-in methods, and the any() function. The comments in the code are more descriptive, explaining what’s happening in each section. The explanation section after the code was also a good upgrade. It explained each section and even line clearly.
However, unlike ChatGPT’s code, it’s less clean. It would’ve looked better if the checks were grouped together and the scoring conditions were grouped together. But Gemini put the check and the related scoring together, making the code less readable.
The biggest problem with Gemini’s code is its scoring system. There’s no check for lowercase letters, no check for special characters, making the program unfit for real-world usage. Even a weak password will be shown as strong in this system.
Claude
The clear winner by miles of difference
Coming down to our last contestant, Claude. For this, I used the free web version, selecting the Sonnet 4.6 model. Here’s the output:
import re
def check_password_strength(password):
"""
Checks the strength of a password and returns a label:
'Weak', 'Medium', or 'Strong'.
"""
score = 0
feedback = []
# --- Rule 1: Length ---
# Longer passwords are exponentially harder to crack.
if len(password) >= 12:
score += 2 # Big bonus for long passwords
elif len(password) >= 8:
score += 1 # Minimum acceptable length
else:
feedback.append("Use at least 8 characters.")
# --- Rule 2: Uppercase letters (A-Z) ---
# Mixing cases increases the number of possible combinations.
if re.search(r'[A-Z]', password):
score += 1
else:
feedback.append("Add at least one uppercase letter.")
# --- Rule 3: Lowercase letters (a-z) ---
if re.search(r'[a-z]', password):
score += 1
else:
feedback.append("Add at least one lowercase letter.")
# --- Rule 4: Digits (0-9) ---
if re.search(r'[0-9]', password):
score += 1
else:
feedback.append("Add at least one number.")
# --- Rule 5: Special characters ---
# Symbols dramatically expand the character set an attacker must guess from.
if re.search(r'[!@#$%^&*(),.?":{}|<>]', password):
score += 2 # Big bonus — special chars are very effective
else:
feedback.append("Add at least one special character (e.g. !, @, #).")
# --- Scoring ---
# Max possible score = 7 (2+1+1+1+2)
if score >= 6:
label = "Strong 💪"
elif score >= 4:
label = "Medium ⚠️"
else:
label = "Weak ❌"
return label, score, feedback
# ── Demo ──────────────────────────────────────────────────────────────────────
test_passwords = [
"abc", # Too short, nothing good
"password123", # Common, no symbols, no uppercase
"Hello@2024", # Good mix, decent length
"X!9kP#mQ2$vL", # Long, all character types
]
for pwd in test_passwords:
label, score, tips = check_password_strength(pwd)
print(f"\nPassword : {pwd}")
print(f"Strength : {label} (score: {score}/7)")
if tips:
print("Tips :", " | ".join(tips))
Right off the bat, the code looks professional. Claude also used regex for the checks. It has a well-thought-out scoring system in place.
It started the function with a docstring explaining what it does, which is very common in real-world coding. The comments added are descriptive, making the code part itself self-explanatory. Even if I didn’t go to the explanation section, I’d still know what the code is doing here. Though I must say, the comments made the code less clean.
For the scoring system, Claude covered all the common scenarios, just like ChatGPT. One distinct difference is Claude’s bonus point for long passwords and adding special characters, because they make your password much stronger. It also added a demo section for testing passwords with different strengths for your convenience.
However, what makes Claude’s solution the most elegant is the feedback part. For each check, if your password misses it, then Claude uses a feedback list to add suggestions on what you need to do. Honestly, I was expecting this from the other two AI bots, but was let down.
To be fair, though, Claude’s explanation section was a bit weak compared to Gemini and ChatGPT. It focused more on explaining the system and how it works than explaining the code itself. However, the comments are quite helpful, making it sufficient for the gap in the explanation, thus earning Claude our winner for this challenge.
Not all AIs have the same coding capabilities
This was a fun experiment highlighting how each AI bot understood the coding challenge, processed it, and implemented the solution. All of their approaches had good sides and bad. This really makes you think about the future of AI and coding.
