FrankensteinNine - A ChatGPT Scraper

Ready to migrate your instances away from the main brain towards local AI? Maybe you’re reading the writing on the wall. We’ve got you covered… I am working on support for recursion (branched and forked messages). It’s a pain to figure out, not sure why…

Before recursion works, I am able to back up about 546MB. Compare to the official export at about 90MB. Something tells me my version is better, and again that’s before recursion works..

These primitives are also useful in helping to build out a bot which is capable of propagating ChatGPT to converse with itself, or even converse with itself itself itself, or other chat bots. You know, recursion, iteration, and recursive iterations.

Obey all ethical laws.

Obey policies if applicable ;).


:magnifying_glass_tilted_left: What It Is

A complete automation script that:

  1. Attaches to an existing Brave browser instance that you’ve manually logged into ChatGPT.
  2. Scrolls and loads all your sidebar conversations.
  3. Opens each conversation, recursively expands forks, scrolls through messages to trigger lazy-loading.
  4. Saves the chat HTML (minus sidebar and header) to a local folder (./convos/).

:gear: How It Works

This Python script is a ChatGPT conversation scraper that uses Selenium to automate your browser (Brave, in this case) and archive all your ChatGPT conversations, including branches/forks, into clean .html files.


:compass: How to Install and Run (Windows + Python 3.11)

:wrench: 1. Install Python 3.11

If you’re anything like me, you have a bunch of different version of Python installed. But as of this writing, the highest Selenium is willing to go is Python 3.11. Enter the need for Python 3.11..

Download: Python Release Python 3.11.0 | Python.org

Check it’s installed:

python --version

:package: 2. Install Dependencies

:locked_with_key: Prerequisites after installing Python 3.11

To run this script, install the following:

pip install selenium

For me, I had to use this command instead:

py -3.11 -m pip install selenium

Additionally:

  • ChromeDriver must match your Brave version.
    • Use https://chromedriver.chromium.org/downloads or webdriver-manager.
    • Put chromedriver.exe in your PATH or current working directory.

You’ll also need:

  • Brave browser installed (should also work fine for Chrome if you are willing to tweak some things, better to just install Brave though).

  • Correct chromedriver.exe in your PATH.

You can download it manually or let webdriver-manager install it:

pip install webdriver-manager

or

py -3.11 -m pip install webdriver-manager

And modify driver init:

from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install(), options=chrome_options)

:brain: Script Behavior Summary

scroll_sidebar_to_bottom()

Loads all conversations by scrolling the left sidebar down and nudging it slightly to trigger lazy loading.


expand_all_forks() (SCROLL TO LATER VERSION IF YOU NEED THIS TO WORK AS FOR V1.0 IT’S SKELETOR)

Recursively clicks all “Continue from here” / “Forked from” / Plus (+) buttons to reveal hidden branches.


img. source: https://static.wikia.nocookie.net/heman/images/7/77/FilmationSkeletorfullbody.webp/revision/latest?cb=20210712121058


scroll_chat()

Scrolls the main conversation window to ensure all messages load (lazy loading behavior of ChatGPT UI).


save_chat_html()

Saves the chat body HTML (without header/sidebar) as .html file in the convos/ directory.


:white_check_mark: Output

  • HTML files saved in:
./convos/
├─ Math Tutoring.html
├─ Trip Planning_1.html
├─ untitled_3.html
└─ ...

Each file contains the full message tree of that conversation, including forks.


:test_tube: 3. Start Brave in Remote Debugging Mode

Run this first, after installing prereqs. You have to run this after closing all browser windows each time you wish to run the batch job:

"C:\Program Files\BraveSoftware\Brave-Browser\Application\brave.exe" --remote-debugging-port=9222 --user-data-dir="C:\Temp\BraveProfile"

:play_button: 4. Run Script

py -3.11 frank9.py

And the following is frank9…

frank9.py

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import time
import os
import re

# === Setup to attach to existing Brave instance ===
chrome_options = Options()
chrome_options.add_experimental_option("debuggerAddress", "localhost:9222")
chrome_options.binary_location = r"C:\\Program Files\\BraveSoftware\\Brave-Browser\\Application\\brave.exe"

driver = webdriver.Chrome(options=chrome_options)
driver.get("https://chat.openai.com/")
input("Login, expand the sidebar to show conversations, then press Enter to continue...")

os.makedirs("convos", exist_ok=True)
visited_forks = set()

def scroll_sidebar_to_bottom():
    try:
        selectors = [
            'nav[aria-label="Chat history"]',
            'nav[aria-label="Conversation List"]',
            'div.flex.flex-col.gap-2.pb-2.overflow-y-auto'
        ]
        sidebar = None
        for selector in selectors:
            try:
                sidebar = driver.find_element(By.CSS_SELECTOR, selector)
                if sidebar:
                    break
            except:
                continue
        if not sidebar:
            raise Exception("Sidebar container not found with known selectors.")

        print("📜 Scrolling sidebar to load all conversations...")

        prev_count = 0
        retries = 0
        max_retries = 15

        while retries < max_retries:
            driver.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", sidebar)
            time.sleep(1.5)

            # Trigger gentle scroll up/down to poke lazy loading
            driver.execute_script("arguments[0].scrollTop -= 50", sidebar)
            time.sleep(0.3)
            driver.execute_script("arguments[0].scrollTop += 50", sidebar)
            time.sleep(0.3)

            # Count loaded conversations
            links = driver.find_elements(By.CSS_SELECTOR, "a[href^='/c/']")
            count = len(links)

            print(f"   Loaded {count} conversations so far...", end="\r")

            if count == prev_count:
                retries += 1
            else:
                retries = 0
            prev_count = count

        print(f"\n✅ Sidebar scroll complete. {prev_count} conversations loaded.\n")

    except Exception as e:
        print(f"❌ Sidebar scroll failed: {e}")

def scroll_chat(max_wait_seconds=60):
    try:
        container = None
        possible_selectors = [
            ".chat-scrollable",
            "main > div > div.flex-1.overflow-y-auto",
            "main div[class*='overflow-y-auto']"
        ]
        for selector in possible_selectors:
            try:
                container = driver.find_element(By.CSS_SELECTOR, selector)
                if container:
                    break
            except:
                continue

        if not container:
            raise Exception("No scrollable chat container found.")

        print("↕️ Scrolling conversation...")
        last_height = driver.execute_script("return arguments[0].scrollHeight", container)
        start_time = time.time()

        while time.time() - start_time < max_wait_seconds:
            driver.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", container)
            time.sleep(1.5)
            new_height = driver.execute_script("return arguments[0].scrollHeight", container)
            if new_height == last_height:
                break
            last_height = new_height

        print("✅ Chat scroll complete.")

    except Exception as e:
        print(f"⚠️ Failed scrolling chat: {e}")

def expand_all_forks(max_depth=10, current_depth=0):
    if current_depth >= max_depth:
        print(f"🚫 Reached max branch expand depth {max_depth}. Stopping recursion.")
        return

    time.sleep(1.5)  # wait for UI

    # XPath selectors for fork expand buttons (may need adjustment if UI changes)
    fork_buttons = driver.find_elements(By.XPATH, "//button[.//svg[@data-testid='PlusIcon'] or contains(text(), 'Continue from here') or contains(text(), 'Forked from this message')]")

    print(f"🔀 Found {len(fork_buttons)} fork expand buttons at depth {current_depth}")

    for idx, btn in enumerate(fork_buttons):
        try:
            btn_key = f"{btn.location['x']}_{btn.location['y']}_{btn.text.strip()}"
            if btn_key in visited_forks:
                continue
            visited_forks.add(btn_key)

            driver.execute_script("arguments[0].scrollIntoView({block: 'center'});", btn)
            driver.execute_script("arguments[0].click();", btn)
            print(f"➡️ Clicked fork expand button {idx+1}/{len(fork_buttons)} at depth {current_depth}")
            time.sleep(3)  # wait for content to load

            # Recursively expand forks in the newly revealed content
            expand_all_forks(max_depth=max_depth, current_depth=current_depth + 1)

        except Exception as e:
            print(f"⚠️ Error clicking fork button {idx} at depth {current_depth}: {e}")

def save_chat_html(title_prefix):
    try:
        scroll_chat()
        container = None
        for selector in [".chat-scrollable", "main div[class*='overflow-y-auto']"]:
            try:
                container = driver.find_element(By.CSS_SELECTOR, selector)
                break
            except:
                continue
        convo_html = container.get_attribute('innerHTML') if container else driver.page_source
        safe_title = re.sub(r'[<>:"/\\\\|?*]+', "_", title_prefix)
        filename = f"convos/{safe_title[:80]}.html"
        with open(filename, "w", encoding="utf-8") as f:
            f.write(convo_html)
        print(f"✅ Saved: {filename}")
    except Exception as e:
        print(f"⚠️ Error saving chat: {e}")

# === Main scrape process ===

scroll_sidebar_to_bottom()
conversation_links = driver.find_elements(By.CSS_SELECTOR, "a[href^='/c/']")
print(f"📄 Found {len(conversation_links)} conversations.\n")

for idx in range(len(conversation_links)):
    try:
        # Re-fetch conversation links each loop to avoid stale elements
        conversation_links = driver.find_elements(By.CSS_SELECTOR, "a[href^='/c/']")
        link = conversation_links[idx]
        title = link.text.strip() or f"untitled_{idx+1}"
        print(f"🔄 Processing [{idx+1}/{len(conversation_links)}]: {title}")

        driver.execute_script("arguments[0].click();", link)
        time.sleep(5)  # wait for conversation to load

        # Expand all fork branches in the conversation before saving
        expand_all_forks(max_depth=10)

        # Scroll entire chat for lazy load of messages
        scroll_chat()

        # Save expanded chat html
        save_chat_html(title)

    except Exception as e:
        print(f"❌ Error processing conversation {idx+1}: {e}")
        time.sleep(2)

print("\n🎉 Done scraping all conversations and forks.")
driver.quit()

These instructions are probably artifacts, and can probably be ignored, but you might need them:


Open PowerShell or Command Prompt and run:

"C:\Program Files\BraveSoftware\Brave-Browser\Application\brave.exe" ^
  --remote-debugging-port=9222 ^
  --user-data-dir="C:\brave-profile"

You must do this before running the script.

Then log in to ChatGPT manually, expand the sidebar, then press Enter when prompted.


:white_check_mark: Setup

python

chrome_options = Options()
chrome_options.add_experimental_option("debuggerAddress", "localhost:9222")
chrome_options.binary_location = r"C:\\Program Files\\BraveSoftware\\Brave-Browser\\Application\\brave.exe"
  • This attaches to an existing Brave session started in remote debugging mode.
  • You must manually start Brave with remote debugging:
"C:\Program Files\BraveSoftware\Brave-Browser\Application\brave.exe" --remote-debugging-port=9222 --user-data-dir="C:\brave-profile"

This allows Python to control the already-authenticated browser.

:fire_extinguisher: Troubleshooting Tips / CHROME users / Different Brave Version

  • If Brave isn’t detected: Ensure it’s running in remote debug mode on port 9222.
  • If chromedriver mismatch: Match versions manually or use webdriver-manager.
  • If the page structure changes: You may need to update the CSS selectors in the script.

:white_check_mark: Different Version? Here’s how to fix it:

1. Download the correct ChromeDriver

  • Go to: Chrome for Testing availability
  • Match your Chrome browser version (found at chrome://settings/help)
  • Download the corresponding chromedriver-win64.zip
  • Extract it and find chromedriver.exe

2. Move chromedriver.exe to a known location

For example:

C:\Users\Owner\Desktop\chromedriver\chromedriver.exe

3. Update your script (frank9.py)

Replace this line:

chromedriver_path = "/path/to/chromedriver"

With this (double backslashes or raw string):

chromedriver_path = r"C:\Users\Owner\Desktop\chromedriver\chromedriver.exe"

:white_check_mark: Bonus: Add Chrome to the PATH (optional)

If you want to avoid hardcoding the path every time, you can:

  • Add chromedriver.exe to a folder like C:\tools\
  • Add that folder to your system PATH
  • Then remove service=Service(...) and just use:
driver = webdriver.Chrome(options=chrome_options)

But for now, just hardcoding the full path should get you running.


Thanks — since you’re using Brave, not Chrome, you’ll still use ChromeDriver, because Brave is Chromium-based and 100% compatible with ChromeDriver. But if you revert back to Chrome, which is pretty much the same thing, it’s more or less the same process, and you might not even need to read this troubleshooter stuff.


:white_check_mark: Step-by-step for Brave and Probably Also Mostly for Chrome

Brave version numbers don’t match Chrome exactly, but they’re usually close. Let’s match your Brave version to a compatible ChromeDriver version.


Step 1: :white_check_mark: Find Brave’s Chromium version

In Brave, go to:

brave://version

Look for something like:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ...
Chrome/123.0.6312.86 Safari/537.36

This will tell us your Chromium version, which is the one ChromeDriver uses.


Step 2: :white_check_mark: Download the matching ChromeDriver

  • Go here: Chrome for Testing download
  • Find the matching major version (e.g., 123 for Chrome/123.0.6312.86)
  • Download:
    • Platform: chromedriver-win64.zip
    • Extract to: C:\Users\Owner\Desktop\chromedriver\chromedriver.exe

Step 3: :white_check_mark: Update the script

In your frank9.py, set:

chromedriver_path = r"C:\Users\Owner\Desktop\chromedriver\chromedriver.exe"

Also, you must tell Selenium to use Brave instead of Chrome:

Add this after chrome_options = Options():

chrome_options.binary_location = r"C:\Program Files\BraveSoftware\Brave-Browser\Application\brave.exe"

(If that’s not the path, check where Brave is installed and update accordingly.)

Version 2.0 (not really any better as yet, but getting closer to support for recursion aka branches / forks of messages)

Version 2.0 adds an interesting zombie mouse and the color red, a box highlighting where our bot is in space. I have determined you must have the zombie mouse in order to activate hover after many hours of head pounding, but I’m still as yet having trouble with CSS elements. I hate CSS. “I HATE feets of strength”…

YARN | Not the feats of strength. | Seinfeld (1993) - S09E10 The Strike | Video gifs by quotes | d7989575 | 紗

perquisite for V2.0 is our mouse override library, basically. Don’t worry, it’s only overriding your mouse per the instructions, below…

py -3.11 -m pip install pyautogui pygetwindow

:test_tube: 3. Start Brave in Remote Debugging Mode

As before, close all Brave windows, run this before each instance of our bot, log in to ChatGPT, and proceed with script by hitting ‘enter.’

"C:\Program Files\BraveSoftware\Brave-Browser\Application\brave.exe" --remote-debugging-port=9222 --user-data-dir="C:\Temp\BraveProfile"

:play_button: 4. Run Script

py -3.11 frank10.py

frank10.py

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
import time, os, re

# === NEW imports for physical mouse movement ===
import pyautogui
import pygetwindow as gw

# === Setup to attach to existing Brave instance ===
chrome_options = Options()
chrome_options.add_experimental_option("debuggerAddress", "localhost:9222")
chrome_options.binary_location = r"C:\\Program Files\\BraveSoftware\\Brave-Browser\\Application\\brave.exe"

driver = webdriver.Chrome(options=chrome_options)
driver.get("https://chat.openai.com/")
input("Login, open a conversation, then press Enter to continue...")

os.makedirs("convos", exist_ok=True)
visited_forks = set()

# === Enumerate available windows ===
print("🪟 Enumerating all open windows:")
for win in gw.getAllWindows():
    print(f"  - {win.title} | visible: {win.visible} | active: {win.isActive} | pos: ({win.left},{win.top})")

# === PHYSICAL CURSOR HOVER ===
def move_physical_cursor_to_element(elem):
    try:
        loc = elem.location
        size = elem.size
        center_x = loc['x'] + size['width'] // 2
        center_y = loc['y'] + size['height'] // 2

        win = next((w for w in gw.getAllWindows() if "ChatGPT" in w.title and w.visible), None)

        if not win:
            print("⚠️ Could not find ChatGPT window — using fallback position (100, 200)")
            screen_x = 100 + center_x
            screen_y = 200 + center_y
        else:
            screen_x = win.left + center_x
            screen_y = win.top + center_y
            print(f"🖱️ Moving mouse to screen ({screen_x}, {screen_y}) — window: {win.title}")

        pyautogui.moveTo(screen_x, screen_y, duration=0.5)
        time.sleep(0.5)
    except Exception as e:
        print(f"⚠️ Failed physical mouse move: {e}")

def hover_all_messages():
    print("🖱️ Physically hovering all .group/turn-messages blocks...")
    wrappers = driver.find_elements(By.CSS_SELECTOR, "div.group\\/turn-messages")
    for i, wrapper in enumerate(wrappers):
        try:
            driver.execute_script("arguments[0].scrollIntoView({block: 'center'});", wrapper)
            driver.execute_script("arguments[0].style.border='2px solid red'", wrapper)
            move_physical_cursor_to_element(wrapper)
        except Exception as e:
            print(f"⚠️ Hover fail on wrapper {i}: {e}")

def scroll_sidebar_to_bottom():
    try:
        sidebar = driver.find_element(By.CSS_SELECTOR, 'nav[aria-label="Chat history"]')
        print("📜 Scrolling sidebar...")
        prev_count, retries, max_retries = 0, 0, 15

        while retries < max_retries:
            driver.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", sidebar)
            time.sleep(1.5)
            driver.execute_script("arguments[0].scrollTop -= 50", sidebar)
            time.sleep(0.3)
            driver.execute_script("arguments[0].scrollTop += 50", sidebar)
            time.sleep(0.3)

            count = len(driver.find_elements(By.CSS_SELECTOR, "a[href^='/c/']"))
            print(f"   Loaded {count} conversations...", end="\r")
            retries = retries + 1 if count == prev_count else 0
            prev_count = count

        print(f"\n✅ Sidebar scroll complete. {prev_count} conversations loaded.")

    except Exception as e:
        print(f"❌ Sidebar scroll failed: {e}")

def scroll_chat(max_wait_seconds=60):
    try:
        container = driver.find_element(By.CSS_SELECTOR, "main div[class*='overflow-y-auto']")
        print("↕️ Scrolling conversation...")
        last_height = driver.execute_script("return arguments[0].scrollHeight", container)
        start_time = time.time()

        while time.time() - start_time < max_wait_seconds:
            driver.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", container)
            time.sleep(1.5)
            new_height = driver.execute_script("return arguments[0].scrollHeight", container)
            if new_height == last_height:
                break
            last_height = new_height

        print("✅ Chat scroll complete.")
    except Exception as e:
        print(f"⚠️ Failed scrolling chat: {e}")

def wait_for_non_empty_text(elem, timeout=5):
    try:
        WebDriverWait(driver, timeout).until(lambda d: elem.text.strip() != "")
        return elem.text.strip()
    except:
        return ""

def get_branch_navigation_elements():
    branch_navs = []
    counters = driver.find_elements(By.CSS_SELECTOR, "div.tabular-nums")
    print(f"🔍 Found {len(counters)} branch counters...")

    for counter in counters:
        try:
            raw_text = wait_for_non_empty_text(counter)
            print("📦 Raw counter text:", repr(raw_text))
            if not re.match(r"\s*\d+/\d+\s*", raw_text):
                print("⚠️ Counter failed regex. Skipping.")
                continue

            nav_wrapper = counter.find_element(
                By.XPATH,
                "./parent::div[contains(@class, 'flex') and (descendant::button[@aria-label='Previous response'] or descendant::button[@aria-label='Next response'])]"
            )

            left_button = nav_wrapper.find_element(By.XPATH, ".//button[@aria-label='Previous response' and not(@disabled)]") if "Previous response" in nav_wrapper.get_attribute("innerHTML") else None
            right_button = nav_wrapper.find_element(By.XPATH, ".//button[@aria-label='Next response' and not(@disabled)]") if "Next response" in nav_wrapper.get_attribute("innerHTML") else None

            print(f"✅ Counter {raw_text} | LEFT: {bool(left_button)} | RIGHT: {bool(right_button)}")
            branch_navs.append({"counter": raw_text, "left": left_button, "right": right_button})

        except Exception as e:
            print(f"❌ Exception parsing counter: {e}")
            continue

    return branch_navs

def cycle_through_branches(nav_info, message_index, depth=0):
    try:
        total_versions = int(nav_info["counter"].split("/")[1])
        current_version = int(nav_info["counter"].split("/")[0])

        while current_version < total_versions:
            print(f"{'  ' * depth}🔁 Switching {current_version} → {current_version + 1}")
            if nav_info["right"]:
                driver.execute_script("arguments[0].click();", nav_info["right"])
                time.sleep(2.5)
                hover_all_messages()
                branch_blocks = get_branch_navigation_elements()
                if len(branch_blocks) > message_index:
                    nav_info = branch_blocks[message_index]
                    current_version = int(nav_info["counter"].split("/")[0])
                else:
                    break
            else:
                break
    except Exception as e:
        print(f"{'  ' * depth}⚠️ Error cycling: {e}")

def expand_all_forks_and_branches(max_depth=10, current_depth=0):
    if current_depth >= max_depth:
        print(f"🧱 Reached max recursion depth {max_depth}")
        return

    time.sleep(2)
    hover_all_messages()

    fork_buttons = driver.find_elements(By.XPATH, "//button[.//svg[@data-testid='PlusIcon'] or contains(text(), 'Continue from here')]")
    print(f"{'  ' * current_depth}� Fork buttons: {len(fork_buttons)}")

    for idx, btn in enumerate(fork_buttons):
        key = f"{btn.location['x']}_{btn.location['y']}_{btn.text.strip()}"
        if key in visited_forks:
            continue
        visited_forks.add(key)

        try:
            driver.execute_script("arguments[0].scrollIntoView({block: 'center'});", btn)
            driver.execute_script("arguments[0].click();", btn)
            print(f"{'  ' * current_depth}➡️ Clicked fork {idx+1}")
            time.sleep(3)
            expand_all_forks_and_branches(max_depth, current_depth + 1)
        except Exception as e:
            print(f"{'  ' * current_depth}⚠️ Error clicking fork: {e}")

    branch_blocks = get_branch_navigation_elements()
    print(f"{'  ' * current_depth}🧬 Branch nav blocks: {len(branch_blocks)}")
    for i, info in enumerate(branch_blocks):
        cycle_through_branches(info, i, current_depth)

def save_chat_html(title_prefix):
    try:
        scroll_chat()
        container = driver.find_element(By.CSS_SELECTOR, "main div[class*='overflow-y-auto']")
        convo_html = container.get_attribute('innerHTML')
        safe_title = re.sub(r'[<>:"/\\\\|?*]+', "_", title_prefix)
        filename = f"convos/{safe_title[:80]}.html"
        with open(filename, "w", encoding="utf-8") as f:
            f.write(convo_html)
        print(f"✅ Saved: {filename}")
    except Exception as e:
        print(f"⚠️ Error saving chat: {e}")

# === MAIN SCRIPT ===

scroll_sidebar_to_bottom()
conversation_links = driver.find_elements(By.CSS_SELECTOR, "a[href^='/c/']")
print(f"📄 Found {len(conversation_links)} conversations.\n")

for idx in range(len(conversation_links)):
    try:
        conversation_links = driver.find_elements(By.CSS_SELECTOR, "a[href^='/c/']")
        link = conversation_links[idx]
        title = link.text.strip() or f"untitled_{idx+1}"
        print(f"\n🔄 Processing [{idx+1}/{len(conversation_links)}]: {title}")

        driver.execute_script("arguments[0].click();", link)
        time.sleep(5)

        expand_all_forks_and_branches(max_depth=10)
        scroll_chat()
        save_chat_html(title)

    except Exception as e:
        print(f"❌ Error processing conversation {idx+1}: {e}")
        time.sleep(2)

print("\n🎉 Done scraping all conversations and forks.")
driver.quit()

This will scroll down with mouse stuck. We need to assign mouse to the correct elements, so we are very close, but… “I HATE feats of strength”. Fixing this will require honing in on the correct mouse placement and probably a little follow-up logic for navigating the arrows.