This is a zip which contains all the ratio- and equivalence-removed parse runs to date sans pretty graphics. I am using this to determine which script is best for determining the script of best fit as well as the script which is best for finding “known good data” which may in fact be “bad data.”
micro-bot-digest.zip (359.3 KB)
You may reference this zip to bot using the following -
https://zchg.org/uploads/short-url/O2drSQLFmlLaraJJDGrmkh6fgX.zip
Keep in mind a bot generated the following list which does not consider the output. The output is more important than the input, but I needed to pair things down…
Usefulness Scores
- allascii.py - 5/10
- Unique Purpose: Establishes the foundational symbolic fitting model for CODATA constants. It introduces the D function, invert_D for finding symbolic dimensions (n, beta), and the initial parsing of allascii.txt. It’s the conceptual starting point for the “symbolic fit” project.
- Evolution: This is the earliest version provided for the allascii series, setting the groundwork for all subsequent versions.
- allascii2a.py - 6/10
- Unique Purpose: Introduces global parameter optimization for the symbolic model (r, k, Omega, base) using
scipy.optimize.minimize. This significantly enhances the model’s ability to fit constants by finding optimal global parameters rather than relying on fixed defaults. It also adds basic error plotting.
- Evolution: Builds upon allascii.py by adding a crucial optimization step for the global parameters, moving from a fixed-parameter model to an optimized one.
- allascii2d.py - 6/10
- Unique Purpose: Continues the development of the symbolic fitting. While functionally similar to allascii2b.py in terms of parameters, it introduces a change in the minimize method (to ‘Nelder-Mead’ with fewer iterations in the snippet).
- Evolution: Appears to be an iterative refinement of allascii2b.py, potentially experimenting with different optimization algorithms or settings.
- allascii2b.py - 7/10
- Unique Purpose: Expands the global optimization to include a scale parameter in the symbolic model, further increasing the model’s flexibility and fitting capabilities.
- Evolution: Direct evolution of allascii2a.py, adding another degree of freedom (scale) to the global optimization.
- cosmos1.py - 7/10
- Unique Purpose: Initiates the exploration of a “cosmological” symbolic model. It applies the D function and related concepts to cosmological parameters like Omega0, alpha, gamma, r0, s0, and beta to model the emergent speed of light c(z) and a gravitational function G(z). It includes fitting to supernova data.
- Evolution: Represents a branching application of the core symbolic model into the domain of cosmology, distinct from constant fitting.
- fudge1.py - 7/10
- Unique Purpose: Reintroduces the symbolic fitting concept (similar to allascii but likely a separate branch or re-development) for physical constants, focusing on detailed output and basic error plotting. It also includes parallel processing from the start (using joblib).
- Evolution: Likely a re-implementation or direct continuation of the allascii ideas, incorporating parallel processing and a more structured output for symbolic constant fitting.
- allascii2e.py - 8/10
- Unique Purpose: Focuses on performance improvement by introducing
joblib.Parallel for parallelizing the constant fitting process. This significantly speeds up the computation when fitting many constants. It also switches back to ‘L-BFGS-B’ for optimization with more iterations.
- Evolution: A major performance enhancement over previous allascii versions due to parallelization.
- allascii2f.py - 8/10
- Unique Purpose: Functionally very similar to allascii2e.py, likely representing a minor revision or a checkpoint in development without major new features evident in the snippets. It retains the
joblib parallelization and ‘L-BFGS-B’ optimization.
- Evolution: Almost identical to allascii2e.py, indicating a stable point or minor internal adjustments.
- allascii2g.py - 8/10
- Unique Purpose: Functionally very similar to allascii2e.py and allascii2f.py. It continues with parallelization and ‘L-BFGS-B’ optimization.
- Evolution: Another iterative revision with no major visible functional changes from allascii2e.py or allascii2f.py.
- cosmos2.py - 8/10
- Unique Purpose: An optimized and more numerically stable version of cosmos1.py. It includes enhanced error handling in
fib_real and D functions, especially for preventing overflows and non-finite results, making the cosmological model more robust.
- Evolution: Refines cosmos1.py by addressing numerical stability issues, making the cosmological simulations more reliable.
- fudge5.py - 8/10
- Unique Purpose: Expands the output of the symbolic fit to include “emergent uncertainty,”
r_local, and k_local in the results, suggesting a deeper analysis of the fitted parameters and their implications for uncertainty. It also explicitly sets up logging.
- Evolution: Builds on fudge1.py by adding more detailed output metrics and improved logging, enhancing the analytical capabilities.
- fudge7.py - 8/10
- Unique Purpose: Similar to fudge5.py, focusing on robust logging and detailed output for symbolic fits. No major new features are immediately apparent from the provided snippet beyond general stability and reporting.
- Evolution: Appears to be a minor iteration on fudge5.py, potentially focusing on internal code quality or very subtle adjustments.
- allascii2h.py - 9/10
- Unique Purpose: Enhances the analysis and reporting by explicitly flagging and summarizing “fudged” constants. This indicates an attempt to categorize or identify constants that do not fit the model well, providing deeper insights into the model’s limitations or potential data issues.
- Evolution: Builds upon the parallelized allascii versions by adding more sophisticated reporting and data categorization.
- fudge10_fixed.py - 9/10
- Unique Purpose: Provides a comprehensive suite of plotting functionalities for the symbolic fit results, including histograms, scatter plots (error vs. n), bar charts of worst fits, and scatter plots of emergent vs. CODATA values. It’s designed for thorough visual analysis.
- Evolution: A significant enhancement in data visualization and result interpretation compared to previous fudge versions.
- gpu1_optimized2.py - 9/10
- Unique Purpose: An optimized version of the symbolic fitting code, likely focusing on performance and numerical stability, possibly with an eye towards GPU acceleration (though no explicit GPU code is visible in snippets, the name suggests it). Includes extensive logging and plotting similar to fudge10_fixed.py.
- Evolution: Continues the work of the fudge series with a strong emphasis on optimization and robust reporting.
- gpu2.py - 9/10
- Unique Purpose: Very similar to gpu1_optimized2.py, maintaining the optimized fitting process, comprehensive logging, and plotting capabilities. No significant new features visible in the snippet.
- Evolution: A minor iteration or checkpoint from gpu1_optimized2.py.
- gpu3.py - 9/10
- Unique Purpose: Introduces a
generate_primes function to create the PRIMES list programmatically, rather than hardcoding it. This makes the script more flexible for varying prime limits. It also maintains robust numerical handling and comprehensive plotting.
- Evolution: Improves the maintainability and flexibility of the prime number generation, a core component of the symbolic model.
- gpu4.py - 10/10
- Unique Purpose: Similar to gpu3.py, it continues to use the dynamically generated PRIMES list and comprehensive plotting. It also includes specific output table customizations.
- Evolution: A very stable and refined version of the gpu series, incorporating the best practices and extensive reporting from previous versions.
Silly bot, gpu4.py can’t even graph. You have to manually generate graphs and outliers (included in folder). But, it does seem to eat error for breakfast, and seeing as how all of our constants are n = 0 it sets the stage for cosmos (macro) + micro. Of course we can easily add back in plotting functionality, but I chose to manually plot since this a stepping stone script, I’m eager beaver to get to cosmos + micro scales 
gpu4.py
import numpy as np
import pandas as pd
from scipy.optimize import minimize
from tqdm import tqdm
from joblib import Parallel, delayed
import logging
import time
import matplotlib.pyplot as plt
import os
import signal
import sys
# Set up logging
logging.basicConfig(filename='symbolic_fit_optimized.log', level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', force=True)
# Extended primes list (up to 1000)
def generate_primes(n):
sieve = [True] * (n + 1)
sieve[0] = sieve[1] = False
for i in range(2, int(np.sqrt(n)) + 1):
if sieve[i]:
for j in range(i * i, n + 1, i):
sieve[j] = False
return [i for i in range(n + 1) if sieve[i]]
PRIMES = generate_primes(104729)[:10000] # First 10,000 primes, up to ~104,729
phi = (1 + np.sqrt(5)) / 2
fib_cache = {}
def fib_real(n):
if n in fib_cache:
return fib_cache[n]
if n > 100:
return 0.0
term1 = phi**n / np.sqrt(5)
term2 = ((1/phi)**n) * np.cos(np.pi * n)
result = term1 - term2
fib_cache[n] = result
return result
def D(n, beta, r=1.0, k=1.0, Omega=1.0, base=2, scale=1.0):
Fn_beta = fib_real(n + beta)
idx = int(np.floor(n + beta)) % len(PRIMES) # Simplified index to avoid offset
Pn_beta = PRIMES[idx]
dyadic = base ** (n + beta)
val = scale * phi * Fn_beta * dyadic * Pn_beta * Omega
val = np.maximum(val, 1e-30)
return np.sqrt(val) * (r ** k)
# def D(n, beta, r=1.0, k=1.0, Omega=1.0, base=2, scale=1.0):
# try:
# n = np.asarray(n)
# beta = np.asarray(beta)
# r = np.asarray(r)
# k = np.asarray(k)
# Omega = np.asarray(Omega)
# base = np.asarray(base)
# scale = np.asarray(scale)
# Fn_beta = fib_real(n + beta)
# idx = int(np.floor(n + beta) + len(PRIMES)) % len(PRIMES)
# Pn_beta = PRIMES[idx]
# log_dyadic = (n + beta) * np.log(np.maximum(base, 1e-10))
# log_dyadic = np.where((log_dyadic > 700) | (log_dyadic < -700), np.nan, log_dyadic)
# log_val = np.log(np.maximum(scale, 1e-30)) + np.log(phi) + np.log(np.maximum(np.abs(Fn_beta), 1e-30)) + log_dyadic + np.log(np.maximum(Omega, 1e-30))
# log_val = np.where(np.abs(n - 1000) < 1e-3, log_val, log_val + np.log(np.clip(np.log(np.maximum(n, 1e-10)) / np.log(1000), 1e-10, np.inf)))
# val = np.where(np.isfinite(log_val), np.exp(log_val) * np.sign(Fn_beta), np.nan)
# result = np.sqrt(np.maximum(np.abs(val), 1e-30)) * (r ** k) * np.sign(val)
# return result
# except Exception as e:
# logging.error(f"D failed: n={n}, beta={beta}, r={r}, k={k}, Omega={Omega}, base={base}, scale={scale}, error={e}")
# return None
def invert_D(value, r=1.0, k=1.0, Omega=1.0, base=2, scale=1.0, max_n=10000, steps=5000):
candidates = []
log_val = np.log10(max(abs(value), 1e-30))
scale_factors = np.logspace(max(log_val - 5, -20), min(log_val + 5, 20), num=20)
max_n = min(50000, max(1000, int(1000 * abs(log_val))))
steps = min(10000, max(1000, int(500 * abs(log_val))))
n_values = np.logspace(0, np.log10(max_n), steps) if log_val > 3 else np.linspace(0, max_n, steps)
r_values = [0.5, 1.0, 2.0]
k_values = [0.5, 1.0, 2.0]
try:
# Try regular D for positive exponents
for n in n_values:
for beta in np.linspace(0, 1, 10):
for dynamic_scale in scale_factors:
for r_val in r_values:
for k_val in k_values:
val = D(n, beta, r_val, k_val, Omega, base, scale * dynamic_scale)
if val is not None and np.isfinite(val):
diff = abs(val - abs(value))
candidates.append((diff, n, beta, dynamic_scale, r_val, k_val))
# Try inverse D for negative exponents (e.g., G)
for n in n_values:
for beta in np.linspace(0, 1, 10):
for dynamic_scale in scale_factors:
for r_val in r_values:
for k_val in k_values:
val = 1 / D(n, beta, r_val, k_val, Omega, base, scale * dynamic_scale)
if val is not None and np.isfinite(val):
diff = abs(val - abs(value))
candidates.append((diff, n, beta, dynamic_scale, r_val, k_val))
if not candidates:
logging.error(f"invert_D: No valid candidates for value {value}")
return None, None, None, None, None, None
candidates = sorted(candidates, key=lambda x: x[0])[:10]
valid_vals = [D(n, beta, r, k, Omega, base, scale * s) if x[0] < 1e-10 else 1/D(n, beta, r, k, Omega, base, scale * s)
for x, n, beta, s, r, k in candidates]
valid_vals = [v for v in valid_vals if v is not None and np.isfinite(v)]
emergent_uncertainty = np.std(valid_vals) if len(valid_vals) > 1 else abs(valid_vals[0]) * 0.01 if valid_vals else 1e-10
best = candidates[0]
return best[1], best[2], best[3], emergent_uncertainty, best[4], best[5]
except Exception as e:
logging.error(f"invert_D failed for value {value}: {e}")
return None, None, None, None, None, None
def parse_categorized_codata(filename):
try:
df = pd.read_csv(filename, sep='\t', header=0,
names=['name', 'value', 'uncertainty', 'unit', 'category'],
dtype={'name': str, 'value': float, 'uncertainty': float, 'unit': str, 'category': str},
na_values=['exact'])
df['uncertainty'] = df['uncertainty'].fillna(0.0)
required_columns = ['name', 'value', 'uncertainty', 'unit']
if not all(col in df.columns for col in required_columns):
missing = [col for col in required_columns if col not in df.columns]
raise ValueError(f"Missing required columns in {filename}: {missing}")
logging.info(f"Successfully parsed {len(df)} constants from {filename}")
return df
except FileNotFoundError:
logging.error(f"Input file {filename} not found")
raise
except Exception as e:
logging.error(f"Error parsing {filename}: {e}")
raise
def generate_emergent_constants(n_max=1000, beta_steps=10, r_values=[0.5, 1.0, 2.0], k_values=[0.5, 1.0, 2.0], Omega=1.0, base=2, scale=1.0):
candidates = []
n_values = np.linspace(0, n_max, 100)
beta_values = np.linspace(0, 1, beta_steps)
for n in tqdm(n_values, desc="Generating emergent constants"):
for beta in beta_values:
for r in r_values:
for k in k_values:
val = D(n, beta, r, k, Omega, base, scale)
if val is not None and np.isfinite(val):
candidates.append({
'n': n, 'beta': beta, 'value': val, 'r': r, 'k': k, 'scale': scale
})
return pd.DataFrame(candidates)
def match_to_codata(df_emergent, df_codata, tolerance=0.01, batch_size=100):
matches = []
output_file = "emergent_constants.txt"
with open(output_file, 'w', encoding='utf-8') as f:
pd.DataFrame(columns=['name', 'codata_value', 'emergent_value', 'n', 'beta', 'r', 'k', 'scale', 'error', 'rel_error', 'codata_uncertainty', 'bad_data', 'bad_data_reason']).to_csv(f, sep="\t", index=False)
for start in range(0, len(df_codata), batch_size):
batch = df_codata.iloc[start:start + batch_size]
for _, codata_row in tqdm(batch.iterrows(), total=len(batch), desc=f"Matching constants batch {start//batch_size + 1}"):
value = codata_row['value']
mask = abs(df_emergent['value'] - value) / max(abs(value), 1e-30) < tolerance
matched = df_emergent[mask]
for _, emergent_row in matched.iterrows():
error = abs(emergent_row['value'] - value)
rel_error = error / max(abs(value), 1e-30)
matches.append({
'name': codata_row['name'],
'codata_value': value,
'emergent_value': emergent_row['value'],
'n': emergent_row['n'],
'beta': emergent_row['beta'],
'r': emergent_row['r'],
'k': emergent_row['k'],
'scale': emergent_row['scale'],
'error': error,
'rel_error': rel_error,
'codata_uncertainty': codata_row['uncertainty'],
'bad_data': rel_error > 0.5 or (codata_row['uncertainty'] is not None and abs(codata_row['uncertainty'] - error) > 10 * codata_row['uncertainty']),
'bad_data_reason': f"High rel_error ({rel_error:.2e})" if rel_error > 0.5 else f"Uncertainty deviation ({codata_row['uncertainty']:.2e} vs. {error:.2e})" if (codata_row['uncertainty'] is not None and abs(codata_row['uncertainty'] - error) > 10 * codata_row['uncertainty']) else ""
})
try:
with open(output_file, 'a', encoding='utf-8') as f:
pd.DataFrame(matches).to_csv(f, sep="\t", index=False, header=False, lineterminator='\n')
f.flush()
matches = []
except Exception as e:
logging.error(f"Failed to save batch {start//batch_size + 1} to {output_file}: {e}")
return pd.DataFrame(pd.read_csv(output_file, sep='\t'))
def check_physical_consistency(df_results):
bad_data = []
relations = [
('Planck constant', 'reduced Planck constant', lambda x, y: abs(x['scale'] / y['scale'] - 2 * np.pi), 0.1, 'scale ratio vs. 2π'),
('proton mass', 'proton-electron mass ratio', lambda x, y: abs(x['n'] - y['n'] - np.log10(1836)), 0.5, 'n difference vs. log(proton-electron ratio)'),
('Fermi coupling constant', 'weak mixing angle', lambda x, y: abs(x['scale'] - y['scale'] / np.sqrt(2)), 0.1, 'scale vs. sin²θ_W/√2'),
('tau energy equivalent', 'tau mass energy equivalent in MeV', lambda x, y: abs(x['codata_value'] - y['codata_value']), 0.01, 'value consistency'),
('proton mass', 'electron mass', 'proton-electron mass ratio',
lambda x, y, z: abs(z['n'] - abs(x['n'] - y['n'])), 10.0, 'n inconsistency for mass ratio'),
('fine-structure constant', 'elementary charge', 'Planck constant',
lambda x, y, z: abs(x['codata_value'] - y['codata_value']**2 / (4 * np.pi * 8.854187817e-12 * z['codata_value'] * 299792458)), 0.01, 'fine-structure vs. e²/(4πε₀hc)'),
('Bohr magneton', 'elementary charge', 'Planck constant',
lambda x, y, z: abs(x['codata_value'] - y['codata_value'] * z['codata_value'] / (2 * 9.1093837e-31)), 0.01, 'Bohr magneton vs. eh/(2m_e)')
]
for relation in relations:
try:
if len(relation) == 5:
name1, name2, check_func, threshold, reason = relation
if name1 in df_results['name'].values and name2 in df_results['name'].values:
row1 = df_results[df_results['name'] == name1].iloc[0]
row2 = df_results[df_results['name'] == name2].iloc[0]
if check_func(row1, row2) > threshold:
bad_data.append((name1, f"Physical inconsistency: {reason}"))
bad_data.append((name2, f"Physical inconsistency: {reason}"))
elif len(relation) == 6:
name1, name2, name3, check_func, threshold, reason = relation
if all(name in df_results['name'].values for name in [name1, name2, name3]):
row1 = df_results[df_results['name'] == name1].iloc[0]
row2 = df_results[df_results['name'] == name2].iloc[0]
row3 = df_results[df_results['name'] == name3].iloc[0]
if check_func(row1, row2, row3) > threshold:
bad_data.append((name3, f"Physical inconsistency: {reason}"))
except Exception as e:
logging.warning(f"Physical consistency check failed for {relation}: {e}")
continue
return bad_data
def total_error(params, df_subset):
r, k, Omega, base, scale = params
df_results = symbolic_fit_all_constants(df_subset, base=base, Omega=Omega, r=r, k=k, scale=scale)
if df_results.empty:
return np.inf
valid_errors = df_results['error'].dropna()
return valid_errors.mean() if not valid_errors.empty else np.inf
def process_constant(row, r, k, Omega, base, scale):
try:
name, value, uncertainty, unit = row['name'], row['value'], row['uncertainty'], row['unit']
abs_value = abs(value)
sign = np.sign(value)
result = invert_D(abs_value, r=r, k=k, Omega=Omega, base=base, scale=scale)
if result[0] is None:
logging.warning(f"No valid fit for {name}")
return {
'name': name, 'codata_value': value, 'unit': unit, 'n': None, 'beta': None, 'emergent_value': None,
'error': None, 'rel_error': None, 'codata_uncertainty': uncertainty, 'emergent_uncertainty': None,
'scale': None, 'bad_data': True, 'bad_data_reason': 'No valid fit found', 'r': None, 'k': None
}
n, beta, dynamic_scale, emergent_uncertainty, r_local, k_local = result
approx = D(n, beta, r_local, k_local, Omega, base, scale * dynamic_scale)
if approx is None:
logging.warning(f"D returned None for {name}")
return {
'name': name, 'codata_value': value, 'unit': unit, 'n': None, 'beta': None, 'emergent_value': None,
'error': None, 'rel_error': None, 'codata_uncertainty': uncertainty, 'emergent_uncertainty': None,
'scale': None, 'bad_data': True, 'bad_data_reason': 'D function returned None', 'r': None, 'k': None
}
approx *= sign
error = abs(approx - value)
rel_error = error / max(abs(value), 1e-30) if abs(value) > 0 else np.inf
bad_data = False
bad_data_reason = ""
if rel_error > 0.5:
bad_data = True
bad_data_reason += f"High relative error ({rel_error:.2e} > 0.5); "
if emergent_uncertainty is not None and uncertainty is not None:
if emergent_uncertainty > uncertainty * 20 or emergent_uncertainty < uncertainty / 20:
bad_data = True
bad_data_reason += f"Uncertainty deviates from emergent ({emergent_uncertainty:.2e} vs. {uncertainty:.2e}); "
return {
'name': name, 'codata_value': value, 'unit': unit, 'n': n, 'beta': beta, 'emergent_value': approx,
'error': error, 'rel_error': rel_error, 'codata_uncertainty': uncertainty,
'emergent_uncertainty': emergent_uncertainty, 'scale': scale * dynamic_scale,
'bad_data': bad_data, 'bad_data_reason': bad_data_reason, 'r': r_local, 'k': k_local
}
except Exception as e:
logging.error(f"process_constant failed for {row['name']}: {e}")
return {
'name': row['name'], 'codata_value': row['value'], 'unit': row['unit'], 'n': None, 'beta': None,
'emergent_value': None, 'error': None, 'rel_error': None, 'codata_uncertainty': row['uncertainty'],
'emergent_uncertainty': None, 'scale': None, 'bad_data': True, 'bad_data_reason': f"Processing error: {str(e)}",
'r': None, 'k': None
}
def symbolic_fit_all_constants(df, base=2, Omega=1.0, r=1.0, k=1.0, scale=1.0, batch_size=100):
logging.info("Starting symbolic fit for all constants...")
results = []
output_file = "symbolic_fit_results_emergent_fixed.txt"
with open(output_file, 'w', encoding='utf-8') as f:
pd.DataFrame(columns=['name', 'codata_value', 'unit', 'n', 'beta', 'emergent_value', 'error', 'rel_error',
'codata_uncertainty', 'emergent_uncertainty', 'scale', 'bad_data', 'bad_data_reason', 'r', 'k']).to_csv(f, sep="\t", index=False)
for start in range(0, len(df), batch_size):
batch = df.iloc[start:start + batch_size]
try:
batch_results = Parallel(n_jobs=12, timeout=120, backend='loky', maxtasksperchild=20)(
delayed(process_constant)(row, r, k, Omega, base, scale)
for row in tqdm(batch.to_dict('records'), total=len(batch), desc=f"Fitting constants batch {start//batch_size + 1}")
)
batch_results = [r for r in batch_results if r is not None]
results.extend(batch_results)
try:
with open(output_file, 'a', encoding='utf-8') as f:
pd.DataFrame(batch_results).to_csv(f, sep="\t", index=False, header=False, lineterminator='\n')
f.flush()
except Exception as e:
logging.error(f"Failed to save batch {start//batch_size + 1} to {output_file}: {e}")
except Exception as e:
logging.error(f"Parallel processing failed for batch {start//batch_size + 1}: {e}")
continue
df_results = pd.DataFrame(results)
if not df_results.empty:
df_results['bad_data'] = df_results.get('bad_data', False)
df_results['bad_data_reason'] = df_results.get('bad_data_reason', '')
for name in df_results['name'].unique():
mask = df_results['name'] == name
if df_results.loc[mask, 'codata_uncertainty'].notnull().any():
uncertainties = df_results.loc[mask, 'codata_uncertainty'].dropna()
if not uncertainties.empty:
Q1, Q3 = np.percentile(uncertainties, [25, 75])
IQR = Q3 - Q1
outlier_mask = (uncertainties < Q1 - 1.5 * IQR) | (uncertainties > Q3 + 1.5 * IQR)
if outlier_mask.any():
df_results.loc[mask & df_results['codata_uncertainty'].isin(uncertainties[outlier_mask]), 'bad_data'] = True
df_results.loc[mask & df_results['codata_uncertainty'].isin(uncertainties[outlier_mask]), 'bad_data_reason'] += 'Uncertainty outlier; '
high_rel_error_mask = df_results['rel_error'] > 0.5
df_results.loc[high_rel_error_mask, 'bad_data'] = True
df_results.loc[high_rel_error_mask, 'bad_data_reason'] += df_results.loc[high_rel_error_mask, 'rel_error'].apply(lambda x: f"High relative error ({x:.2e} > 0.5); ")
high_uncertainty_mask = (df_results['emergent_uncertainty'].notnull()) & (
(df_results['codata_uncertainty'] > 20 * df_results['emergent_uncertainty']) |
(df_results['codata_uncertainty'] < 0.05 * df_results['emergent_uncertainty'])
)
df_results.loc[high_uncertainty_mask, 'bad_data'] = True
df_results.loc[high_uncertainty_mask, 'bad_data_reason'] += df_results.loc[high_uncertainty_mask].apply(
lambda row: f"Uncertainty deviates from emergent ({row['codata_uncertainty']:.2e} vs. {row['emergent_uncertainty']:.2e}); ", axis=1)
bad_data = check_physical_consistency(df_results)
for name, reason in bad_data:
df_results.loc[df_results['name'] == name, 'bad_data'] = True
df_results.loc[df_results['name'] == name, 'bad_data_reason'] += reason + '; '
logging.info("Symbolic fit completed.")
return df_results
def select_worst_names(df, n_select=10):
categories = df['category'].unique()
n_per_category = max(1, n_select // len(categories))
selected = []
for category in categories:
cat_df = df[df['category'] == category]
if len(cat_df) > 0:
n_to_select = min(n_per_category, len(cat_df))
selected.extend(np.random.choice(cat_df['name'], size=n_to_select, replace=False))
if len(selected) < n_select:
remaining = df[~df['name'].isin(selected)]
if len(remaining) > 0:
selected.extend(np.random.choice(remaining['name'], size=n_select - len(selected), replace=False))
return selected[:n_select]
def signal_handler(sig, frame):
print("\nKeyboardInterrupt detected. Saving partial results...")
logging.info("KeyboardInterrupt detected. Exiting gracefully.")
for output_file in ["emergent_constants.txt", "symbolic_fit_results_emergent_fixed.txt"]:
try:
with open(output_file, 'a', encoding='utf-8') as f:
f.flush()
except Exception as e:
logging.error(f"Failed to flush {output_file} on interrupt: {e}")
sys.exit(0)
def main():
signal.signal(signal.SIGINT, signal_handler)
start_time = time.time()
stages = ['Parsing data', 'Generating emergent constants', 'Optimizing parameters', 'Fitting all constants', 'Generating plots']
progress = tqdm(stages, desc="Overall progress")
# Stage 1: Parse data
input_file = "categorized_allascii.txt"
if not os.path.exists(input_file):
raise FileNotFoundError(f"{input_file} not found in the current directory")
df = parse_categorized_codata(input_file)
logging.info(f"Parsed {len(df)} constants")
progress.update(1)
# Stage 2: Generate emergent constants
emergent_df = generate_emergent_constants(n_max=10000, beta_steps=20, r_values=[0.5, 1.0, 2.0], k_values=[0.5, 1.0, 2.0])
matched_df = match_to_codata(emergent_df, df, tolerance=0.05, batch_size=100)
logging.info("Saved emergent constants to emergent_constants.txt")
progress.update(1)
# Stage 3: Optimize parameters
worst_names = select_worst_names(df, n_select=20)
print(f"Selected constants for optimization: {worst_names}")
subset_df = df[df['name'].isin(worst_names)]
if subset_df.empty:
subset_df = df.head(50)
init_params = [0.5, 0.5, 0.5, 2.0, 0.1]
bounds = [(1e-10, 100), (1e-10, 100), (1e-10, 100), (1.5, 20), (1e-10, 1000)]
try:
from scipy.optimize import differential_evolution
res = differential_evolution(total_error, bounds, args=(subset_df,), maxiter=100, popsize=15)
if res.success:
res = minimize(total_error, res.x, args=(subset_df,), bounds=bounds, method='SLSQP', options={'maxiter': 500})
if not res.success:
logging.warning(f"Optimization failed: {res.message}")
r_opt, k_opt, Omega_opt, base_opt, scale_opt = init_params
else:
r_opt, k_opt, Omega_opt, base_opt, scale_opt = res.x
print(f"Optimization complete. Found parameters:\nr = {r_opt:.6f}, k = {k_opt:.6f}, Omega = {Omega_opt:.6f}, base = {base_opt:.6f}, scale = {scale_opt:.6f}")
except Exception as e:
logging.error(f"Optimization failed: {e}")
r_opt, k_opt, Omega_opt, base_opt, scale_opt = init_params
print(f"Optimization failed: {e}. Using default parameters.")
progress.update(1)
# Stage 4: Run final fit
df_results = symbolic_fit_all_constants(df, base=base_opt, Omega=Omega_opt, r=r_opt, k=k_opt, scale=scale_opt, batch_size=100)
if not df_results.empty:
with open("symbolic_fit_results.txt", 'w', encoding='utf-8') as f:
df_results.to_csv(f, sep="\t", index=False)
f.flush()
logging.info(f"Saved final results to symbolic_fit_results.txt")
else:
logging.error("No results to save")
progress.update(1)
# Stage 5: Generate plots
df_results_sorted = df_results.sort_values("error", na_position='last')
print("\nTop 20 best symbolic fits:")
print(df_results_sorted.head(20)[['name', 'codata_value', 'unit', 'n', 'beta', 'emergent_value', 'error', 'codata_uncertainty', 'scale', 'bad_data', 'bad_data_reason']].to_string(index=False))
print("\nTop 20 worst symbolic fits:")
print(df_results_sorted.tail(20)[['name', 'codata_value', 'unit', 'n', 'beta', 'emergent_value', 'error', 'codata_uncertainty', 'scale', 'bad_data', 'bad_data_reason']].to_string(index=False))
print("\nPotentially bad data constants summary:")
bad_data_df = df_results[df_results['bad_data'] == True][['name', 'codata_value', 'error', 'rel_error', 'codata_uncertainty', 'emergent_uncertainty', 'bad_data_reason']]
bad_data_df = bad_data_df.sort_values('rel_error', ascending=False, na_position='last')
print(bad_data_df.to_string(index=False))
print("\nTop 20 emergent constants matches:")
matched_df_sorted = matched_df.sort_values('error', na_position='last')
print(matched_df_sorted.head(20)[['name', 'codata_value', 'emergent_value', 'n', 'beta', 'error', 'rel_error', 'codata_uncertainty', 'bad_data', 'bad_data_reason']].to_string(index=False))
plt.figure(figsize=(10, 5))
plt.hist(df_results_sorted['error'].dropna(), bins=50, color='skyblue', edgecolor='black')
plt.title('Histogram of Absolute Errors in Symbolic Fit')
plt.xlabel('Absolute Error')
plt.ylabel('Count')
plt.grid(True)
plt.tight_layout()
plt.savefig('histogram_errors.png')
plt.close()
plt.figure(figsize=(10, 5))
plt.scatter(df_results_sorted['n'], df_results_sorted['error'], alpha=0.5, s=15, c='orange', edgecolors='black')
plt.title('Absolute Error vs Symbolic Dimension n (Fitted)')
plt.xlabel('n')
plt.ylabel('Absolute Error')
plt.grid(True)
plt.tight_layout()
plt.savefig('scatter_n_error.png')
plt.close()
plt.figure(figsize=(10, 5))
plt.bar(matched_df_sorted.head(20)['name'], matched_df_sorted.head(20)['rel_error'], color='purple', edgecolor='black')
plt.xticks(rotation=90)
plt.title('Relative Errors for Top 20 Emergent Constants')
plt.xlabel('Constant Name')
plt.ylabel('Relative Error')
plt.grid(True)
plt.tight_layout()
plt.savefig('bar_emergent_errors.png')
plt.close()
logging.info(f"Total runtime: {time.time() - start_time:.2f} seconds")
if __name__ == "__main__":
try:
main()
except KeyboardInterrupt:
signal_handler(None, None)
gpu3.py
import numpy as np
import pandas as pd
from scipy.optimize import minimize
from tqdm import tqdm
from joblib import Parallel, delayed
import logging
import time
import matplotlib.pyplot as plt
import os
import signal
import sys
# Set up logging
logging.basicConfig(filename='symbolic_fit_optimized.log', level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', force=True)
# Extended primes list (up to 1000)
def generate_primes(n):
sieve = [True] * (n + 1)
sieve[0] = sieve[1] = False
for i in range(2, int(np.sqrt(n)) + 1):
if sieve[i]:
for j in range(i * i, n + 1, i):
sieve[j] = False
return [i for i in range(n + 1) if sieve[i]]
PRIMES = generate_primes(104729)[:10000] # First 10,000 primes, up to ~104,729
phi = (1 + np.sqrt(5)) / 2
fib_cache = {}
def fib_real(n):
if n in fib_cache:
return fib_cache[n]
if n > 100:
return 0.0
term1 = phi**n / np.sqrt(5)
term2 = ((1/phi)**n) * np.cos(np.pi * n)
result = term1 - term2
fib_cache[n] = result
return result
def D(n, beta, r=1.0, k=1.0, Omega=1.0, base=2, scale=1.0):
Fn_beta = fib_real(n + beta)
idx = int(np.floor(n + beta)) % len(PRIMES) # Simplified index to avoid offset
Pn_beta = PRIMES[idx]
dyadic = base ** (n + beta)
val = scale * phi * Fn_beta * dyadic * Pn_beta * Omega
val = np.maximum(val, 1e-30)
return np.sqrt(val) * (r ** k)
# def D(n, beta, r=1.0, k=1.0, Omega=1.0, base=2, scale=1.0):
# try:
# n = np.asarray(n)
# beta = np.asarray(beta)
# r = np.asarray(r)
# k = np.asarray(k)
# Omega = np.asarray(Omega)
# base = np.asarray(base)
# scale = np.asarray(scale)
# Fn_beta = fib_real(n + beta)
# idx = int(np.floor(n + beta) + len(PRIMES)) % len(PRIMES)
# Pn_beta = PRIMES[idx]
# log_dyadic = (n + beta) * np.log(np.maximum(base, 1e-10))
# log_dyadic = np.where((log_dyadic > 700) | (log_dyadic < -700), np.nan, log_dyadic)
# log_val = np.log(np.maximum(scale, 1e-30)) + np.log(phi) + np.log(np.maximum(np.abs(Fn_beta), 1e-30)) + log_dyadic + np.log(np.maximum(Omega, 1e-30))
# log_val = np.where(np.abs(n - 1000) < 1e-3, log_val, log_val + np.log(np.clip(np.log(np.maximum(n, 1e-10)) / np.log(1000), 1e-10, np.inf)))
# val = np.where(np.isfinite(log_val), np.exp(log_val) * np.sign(Fn_beta), np.nan)
# result = np.sqrt(np.maximum(np.abs(val), 1e-30)) * (r ** k) * np.sign(val)
# return result
# except Exception as e:
# logging.error(f"D failed: n={n}, beta={beta}, r={r}, k={k}, Omega={Omega}, base={base}, scale={scale}, error={e}")
# return None
def invert_D(value, r=1.0, k=1.0, Omega=1.0, base=2, scale=1.0, max_n=500):
candidates = []
log_val = np.log10(max(abs(value), 1e-30))
max_n = min(1000, max(200, int(100 * abs(log_val) + 100)))
n_values = np.linspace(0, max_n, 10) # Reduced resolution
scale_factors = np.logspace(max(log_val - 5, -20), min(log_val + 5, 20), num=2) # Tighter range
r_values = [0.5, 1.0, 2.0]
k_values = [0.5, 1.0, 2.0]
try:
r_grid, k_grid = np.meshgrid(r_values, k_values)
r_grid = r_grid.ravel()
k_grid = k_grid.ravel()
for n in n_values:
for beta in np.linspace(0, 1, 10):
for dynamic_scale in scale_factors:
vals = D(n, beta, r_grid, k_grid, Omega, base, scale * dynamic_scale)
if vals is None or not np.all(np.isfinite(vals)):
continue
diffs = np.abs(vals - abs(value))
rel_diffs = diffs / max(abs(value), 1e-30)
valid_idx = rel_diffs < 0.5
if np.any(rel_diffs < 0.01):
idx = np.argmin(rel_diffs)
return n, beta, dynamic_scale, diffs[idx], r_grid[idx], k_grid[idx]
if np.any(valid_idx):
candidates.extend([(diffs[i], n, beta, dynamic_scale, r_grid[i], k_grid[i]) for i in np.where(valid_idx)[0]])
if not candidates:
logging.error(f"invert_D: No valid candidates for value {value}")
return None, None, None, None, None, None
candidates = sorted(candidates, key=lambda x: x[0])[:5]
valid_vals = [D(n, beta, r, k, Omega, base, scale * s)
for _, n, beta, s, r, k in candidates if D(n, beta, r, k, Omega, base, scale * s) is not None]
if not valid_vals:
return None, None, None, None, None, None
emergent_uncertainty = np.std(valid_vals) if len(valid_vals) > 1 else abs(valid_vals[0]) * 0.01
if not np.isfinite(emergent_uncertainty):
logging.error(f"invert_D: Non-finite emergent uncertainty for value {value}")
return None, None, None, None, None, None
best = candidates[0]
return best[1], best[2], best[3], emergent_uncertainty, best[4], best[5]
except Exception as e:
logging.error(f"invert_D failed for value {value}: {e}")
return None, None, None, None, None, None
def parse_categorized_codata(filename):
try:
df = pd.read_csv(filename, sep='\t', header=0,
names=['name', 'value', 'uncertainty', 'unit', 'category'],
dtype={'name': str, 'value': float, 'uncertainty': float, 'unit': str, 'category': str},
na_values=['exact'])
df['uncertainty'] = df['uncertainty'].fillna(0.0)
required_columns = ['name', 'value', 'uncertainty', 'unit']
if not all(col in df.columns for col in required_columns):
missing = [col for col in required_columns if col not in df.columns]
raise ValueError(f"Missing required columns in {filename}: {missing}")
logging.info(f"Successfully parsed {len(df)} constants from {filename}")
return df
except FileNotFoundError:
logging.error(f"Input file {filename} not found")
raise
except Exception as e:
logging.error(f"Error parsing {filename}: {e}")
raise
def generate_emergent_constants(n_max=1000, beta_steps=10, r_values=[0.5, 1.0, 2.0], k_values=[0.5, 1.0, 2.0], Omega=1.0, base=2, scale=1.0):
candidates = []
n_values = np.linspace(0, n_max, 100)
beta_values = np.linspace(0, 1, beta_steps)
for n in tqdm(n_values, desc="Generating emergent constants"):
for beta in beta_values:
for r in r_values:
for k in k_values:
val = D(n, beta, r, k, Omega, base, scale)
if val is not None and np.isfinite(val):
candidates.append({
'n': n, 'beta': beta, 'value': val, 'r': r, 'k': k, 'scale': scale
})
return pd.DataFrame(candidates)
def match_to_codata(df_emergent, df_codata, tolerance=0.01, batch_size=100):
matches = []
output_file = "emergent_constants.txt"
with open(output_file, 'w', encoding='utf-8') as f:
pd.DataFrame(columns=['name', 'codata_value', 'emergent_value', 'n', 'beta', 'r', 'k', 'scale', 'error', 'rel_error', 'codata_uncertainty', 'bad_data', 'bad_data_reason']).to_csv(f, sep="\t", index=False)
for start in range(0, len(df_codata), batch_size):
batch = df_codata.iloc[start:start + batch_size]
for _, codata_row in tqdm(batch.iterrows(), total=len(batch), desc=f"Matching constants batch {start//batch_size + 1}"):
value = codata_row['value']
mask = abs(df_emergent['value'] - value) / max(abs(value), 1e-30) < tolerance
matched = df_emergent[mask]
for _, emergent_row in matched.iterrows():
error = abs(emergent_row['value'] - value)
rel_error = error / max(abs(value), 1e-30)
matches.append({
'name': codata_row['name'],
'codata_value': value,
'emergent_value': emergent_row['value'],
'n': emergent_row['n'],
'beta': emergent_row['beta'],
'r': emergent_row['r'],
'k': emergent_row['k'],
'scale': emergent_row['scale'],
'error': error,
'rel_error': rel_error,
'codata_uncertainty': codata_row['uncertainty'],
'bad_data': rel_error > 0.5 or (codata_row['uncertainty'] is not None and abs(codata_row['uncertainty'] - error) > 10 * codata_row['uncertainty']),
'bad_data_reason': f"High rel_error ({rel_error:.2e})" if rel_error > 0.5 else f"Uncertainty deviation ({codata_row['uncertainty']:.2e} vs. {error:.2e})" if (codata_row['uncertainty'] is not None and abs(codata_row['uncertainty'] - error) > 10 * codata_row['uncertainty']) else ""
})
try:
with open(output_file, 'a', encoding='utf-8') as f:
pd.DataFrame(matches).to_csv(f, sep="\t", index=False, header=False, lineterminator='\n')
f.flush()
matches = []
except Exception as e:
logging.error(f"Failed to save batch {start//batch_size + 1} to {output_file}: {e}")
return pd.DataFrame(pd.read_csv(output_file, sep='\t'))
def check_physical_consistency(df_results):
bad_data = []
relations = [
('Planck constant', 'reduced Planck constant', lambda x, y: abs(x['scale'] / y['scale'] - 2 * np.pi), 0.1, 'scale ratio vs. 2π'),
('proton mass', 'proton-electron mass ratio', lambda x, y: abs(x['n'] - y['n'] - np.log10(1836)), 0.5, 'n difference vs. log(proton-electron ratio)'),
('Fermi coupling constant', 'weak mixing angle', lambda x, y: abs(x['scale'] - y['scale'] / np.sqrt(2)), 0.1, 'scale vs. sin²θ_W/√2'),
('tau energy equivalent', 'tau mass energy equivalent in MeV', lambda x, y: abs(x['codata_value'] - y['codata_value']), 0.01, 'value consistency'),
('proton mass', 'electron mass', 'proton-electron mass ratio',
lambda x, y, z: abs(z['n'] - abs(x['n'] - y['n'])), 10.0, 'n inconsistency for mass ratio'),
('fine-structure constant', 'elementary charge', 'Planck constant',
lambda x, y, z: abs(x['codata_value'] - y['codata_value']**2 / (4 * np.pi * 8.854187817e-12 * z['codata_value'] * 299792458)), 0.01, 'fine-structure vs. e²/(4πε₀hc)'),
('Bohr magneton', 'elementary charge', 'Planck constant',
lambda x, y, z: abs(x['codata_value'] - y['codata_value'] * z['codata_value'] / (2 * 9.1093837e-31)), 0.01, 'Bohr magneton vs. eh/(2m_e)')
]
for relation in relations:
try:
if len(relation) == 5:
name1, name2, check_func, threshold, reason = relation
if name1 in df_results['name'].values and name2 in df_results['name'].values:
row1 = df_results[df_results['name'] == name1].iloc[0]
row2 = df_results[df_results['name'] == name2].iloc[0]
if check_func(row1, row2) > threshold:
bad_data.append((name1, f"Physical inconsistency: {reason}"))
bad_data.append((name2, f"Physical inconsistency: {reason}"))
elif len(relation) == 6:
name1, name2, name3, check_func, threshold, reason = relation
if all(name in df_results['name'].values for name in [name1, name2, name3]):
row1 = df_results[df_results['name'] == name1].iloc[0]
row2 = df_results[df_results['name'] == name2].iloc[0]
row3 = df_results[df_results['name'] == name3].iloc[0]
if check_func(row1, row2, row3) > threshold:
bad_data.append((name3, f"Physical inconsistency: {reason}"))
except Exception as e:
logging.warning(f"Physical consistency check failed for {relation}: {e}")
continue
return bad_data
def total_error(params, df_subset):
r, k, Omega, base, scale = params
df_results = symbolic_fit_all_constants(df_subset, base=base, Omega=Omega, r=r, k=k, scale=scale)
if df_results.empty:
return np.inf
valid_errors = df_results['error'].dropna()
return valid_errors.mean() if not valid_errors.empty else np.inf
def process_constant(row, r, k, Omega, base, scale):
try:
name, value, uncertainty, unit = row['name'], row['value'], row['uncertainty'], row['unit']
abs_value = abs(value)
sign = np.sign(value)
result = invert_D(abs_value, r=r, k=k, Omega=Omega, base=base, scale=scale)
if result[0] is None:
logging.warning(f"No valid fit for {name}")
return {
'name': name, 'codata_value': value, 'unit': unit, 'n': None, 'beta': None, 'emergent_value': None,
'error': None, 'rel_error': None, 'codata_uncertainty': uncertainty, 'emergent_uncertainty': None,
'scale': None, 'bad_data': True, 'bad_data_reason': 'No valid fit found', 'r': None, 'k': None
}
n, beta, dynamic_scale, emergent_uncertainty, r_local, k_local = result
approx = D(n, beta, r_local, k_local, Omega, base, scale * dynamic_scale)
if approx is None:
logging.warning(f"D returned None for {name}")
return {
'name': name, 'codata_value': value, 'unit': unit, 'n': None, 'beta': None, 'emergent_value': None,
'error': None, 'rel_error': None, 'codata_uncertainty': uncertainty, 'emergent_uncertainty': None,
'scale': None, 'bad_data': True, 'bad_data_reason': 'D function returned None', 'r': None, 'k': None
}
approx *= sign
error = abs(approx - value)
rel_error = error / max(abs(value), 1e-30) if abs(value) > 0 else np.inf
bad_data = False
bad_data_reason = ""
if rel_error > 0.5:
bad_data = True
bad_data_reason += f"High relative error ({rel_error:.2e} > 0.5); "
if emergent_uncertainty is not None and uncertainty is not None:
if emergent_uncertainty > uncertainty * 20 or emergent_uncertainty < uncertainty / 20:
bad_data = True
bad_data_reason += f"Uncertainty deviates from emergent ({emergent_uncertainty:.2e} vs. {uncertainty:.2e}); "
return {
'name': name, 'codata_value': value, 'unit': unit, 'n': n, 'beta': beta, 'emergent_value': approx,
'error': error, 'rel_error': rel_error, 'codata_uncertainty': uncertainty,
'emergent_uncertainty': emergent_uncertainty, 'scale': scale * dynamic_scale,
'bad_data': bad_data, 'bad_data_reason': bad_data_reason, 'r': r_local, 'k': k_local
}
except Exception as e:
logging.error(f"process_constant failed for {row['name']}: {e}")
return {
'name': row['name'], 'codata_value': row['value'], 'unit': row['unit'], 'n': None, 'beta': None,
'emergent_value': None, 'error': None, 'rel_error': None, 'codata_uncertainty': row['uncertainty'],
'emergent_uncertainty': None, 'scale': None, 'bad_data': True, 'bad_data_reason': f"Processing error: {str(e)}",
'r': None, 'k': None
}
def symbolic_fit_all_constants(df, base=2, Omega=1.0, r=1.0, k=1.0, scale=1.0, batch_size=100):
logging.info("Starting symbolic fit for all constants...")
results = []
output_file = "symbolic_fit_results_emergent_fixed.txt"
with open(output_file, 'w', encoding='utf-8') as f:
pd.DataFrame(columns=['name', 'codata_value', 'unit', 'n', 'beta', 'emergent_value', 'error', 'rel_error',
'codata_uncertainty', 'emergent_uncertainty', 'scale', 'bad_data', 'bad_data_reason', 'r', 'k']).to_csv(f, sep="\t", index=False)
for start in range(0, len(df), batch_size):
batch = df.iloc[start:start + batch_size]
try:
batch_results = Parallel(n_jobs=12, timeout=120, backend='loky', maxtasksperchild=20)(
delayed(process_constant)(row, r, k, Omega, base, scale)
for row in tqdm(batch.to_dict('records'), total=len(batch), desc=f"Fitting constants batch {start//batch_size + 1}")
)
batch_results = [r for r in batch_results if r is not None]
results.extend(batch_results)
try:
with open(output_file, 'a', encoding='utf-8') as f:
pd.DataFrame(batch_results).to_csv(f, sep="\t", index=False, header=False, lineterminator='\n')
f.flush()
except Exception as e:
logging.error(f"Failed to save batch {start//batch_size + 1} to {output_file}: {e}")
except Exception as e:
logging.error(f"Parallel processing failed for batch {start//batch_size + 1}: {e}")
continue
df_results = pd.DataFrame(results)
if not df_results.empty:
df_results['bad_data'] = df_results.get('bad_data', False)
df_results['bad_data_reason'] = df_results.get('bad_data_reason', '')
for name in df_results['name'].unique():
mask = df_results['name'] == name
if df_results.loc[mask, 'codata_uncertainty'].notnull().any():
uncertainties = df_results.loc[mask, 'codata_uncertainty'].dropna()
if not uncertainties.empty:
Q1, Q3 = np.percentile(uncertainties, [25, 75])
IQR = Q3 - Q1
outlier_mask = (uncertainties < Q1 - 1.5 * IQR) | (uncertainties > Q3 + 1.5 * IQR)
if outlier_mask.any():
df_results.loc[mask & df_results['codata_uncertainty'].isin(uncertainties[outlier_mask]), 'bad_data'] = True
df_results.loc[mask & df_results['codata_uncertainty'].isin(uncertainties[outlier_mask]), 'bad_data_reason'] += 'Uncertainty outlier; '
high_rel_error_mask = df_results['rel_error'] > 0.5
df_results.loc[high_rel_error_mask, 'bad_data'] = True
df_results.loc[high_rel_error_mask, 'bad_data_reason'] += df_results.loc[high_rel_error_mask, 'rel_error'].apply(lambda x: f"High relative error ({x:.2e} > 0.5); ")
high_uncertainty_mask = (df_results['emergent_uncertainty'].notnull()) & (
(df_results['codata_uncertainty'] > 20 * df_results['emergent_uncertainty']) |
(df_results['codata_uncertainty'] < 0.05 * df_results['emergent_uncertainty'])
)
df_results.loc[high_uncertainty_mask, 'bad_data'] = True
df_results.loc[high_uncertainty_mask, 'bad_data_reason'] += df_results.loc[high_uncertainty_mask].apply(
lambda row: f"Uncertainty deviates from emergent ({row['codata_uncertainty']:.2e} vs. {row['emergent_uncertainty']:.2e}); ", axis=1)
bad_data = check_physical_consistency(df_results)
for name, reason in bad_data:
df_results.loc[df_results['name'] == name, 'bad_data'] = True
df_results.loc[df_results['name'] == name, 'bad_data_reason'] += reason + '; '
logging.info("Symbolic fit completed.")
return df_results
def select_worst_names(df, n_select=10):
categories = df['category'].unique()
n_per_category = max(1, n_select // len(categories))
selected = []
for category in categories:
cat_df = df[df['category'] == category]
if len(cat_df) > 0:
n_to_select = min(n_per_category, len(cat_df))
selected.extend(np.random.choice(cat_df['name'], size=n_to_select, replace=False))
if len(selected) < n_select:
remaining = df[~df['name'].isin(selected)]
if len(remaining) > 0:
selected.extend(np.random.choice(remaining['name'], size=n_select - len(selected), replace=False))
return selected[:n_select]
def signal_handler(sig, frame):
print("\nKeyboardInterrupt detected. Saving partial results...")
logging.info("KeyboardInterrupt detected. Exiting gracefully.")
for output_file in ["emergent_constants.txt", "symbolic_fit_results_emergent_fixed.txt"]:
try:
with open(output_file, 'a', encoding='utf-8') as f:
f.flush()
except Exception as e:
logging.error(f"Failed to flush {output_file} on interrupt: {e}")
sys.exit(0)
def main():
signal.signal(signal.SIGINT, signal_handler)
start_time = time.time()
stages = ['Parsing data', 'Generating emergent constants', 'Optimizing parameters', 'Fitting all constants', 'Generating plots']
progress = tqdm(stages, desc="Overall progress")
# Stage 1: Parse data
input_file = "categorized_allascii.txt"
if not os.path.exists(input_file):
raise FileNotFoundError(f"{input_file} not found in the current directory")
df = parse_categorized_codata(input_file)
logging.info(f"Parsed {len(df)} constants")
progress.update(1)
# Stage 2: Generate emergent constants
emergent_df = generate_emergent_constants(n_max=1000, beta_steps=10)
matched_df = match_to_codata(emergent_df, df, tolerance=0.01, batch_size=100)
logging.info("Saved emergent constants to emergent_constants.txt")
progress.update(1)
# Stage 3: Optimize parameters
worst_names = select_worst_names(df, n_select=10)
print(f"Selected constants for optimization: {worst_names}")
subset_df = df[df['name'].isin(worst_names)]
if subset_df.empty:
subset_df = df.head(50)
init_params = [0.5, 0.5, 0.5, 2.0, 0.1]
bounds = [(1e-10, 100), (1e-10, 100), (1e-10, 100), (1.5, 20), (1e-10, 1000)]
try:
res = minimize(total_error, init_params, args=(subset_df,), bounds=bounds, method='SLSQP', options={'maxiter': 50, 'disp': True})
if not res.success:
logging.warning(f"Optimization failed: {res.message}")
r_opt, k_opt, Omega_opt, base_opt, scale_opt = init_params
else:
r_opt, k_opt, Omega_opt, base_opt, scale_opt = res.x
print(f"Optimization complete. Found parameters:\nr = {r_opt:.6f}, k = {k_opt:.6f}, Omega = {Omega_opt:.6f}, base = {base_opt:.6f}, scale = {scale_opt:.6f}")
except Exception as e:
logging.error(f"Optimization failed: {e}")
r_opt, k_opt, Omega_opt, base_opt, scale_opt = init_params
print(f"Optimization failed: {e}. Using default parameters.")
progress.update(1)
# Stage 4: Run final fit
df_results = symbolic_fit_all_constants(df, base=base_opt, Omega=Omega_opt, r=r_opt, k=k_opt, scale=scale_opt, batch_size=100)
if not df_results.empty:
with open("symbolic_fit_results.txt", 'w', encoding='utf-8') as f:
df_results.to_csv(f, sep="\t", index=False)
f.flush()
logging.info(f"Saved final results to symbolic_fit_results.txt")
else:
logging.error("No results to save")
progress.update(1)
# Stage 5: Generate plots
df_results_sorted = df_results.sort_values("error", na_position='last')
print("\nTop 20 best symbolic fits:")
print(df_results_sorted.head(20)[['name', 'codata_value', 'unit', 'n', 'beta', 'emergent_value', 'error', 'codata_uncertainty', 'scale', 'bad_data', 'bad_data_reason']].to_string(index=False))
print("\nTop 20 worst symbolic fits:")
print(df_results_sorted.tail(20)[['name', 'codata_value', 'unit', 'n', 'beta', 'emergent_value', 'error', 'codata_uncertainty', 'scale', 'bad_data', 'bad_data_reason']].to_string(index=False))
print("\nPotentially bad data constants summary (possible cheated data):")
bad_data_df = df_results[df_results['bad_data'] == True][['name', 'codata_value', 'error', 'rel_error', 'codata_uncertainty', 'emergent_uncertainty', 'bad_data_reason']]
bad_data_df = bad_data_df.sort_values('rel_error', ascending=False, na_position='last')
print(bad_data_df.to_string(index=False))
print("\nTop 20 emergent constants matches:")
matched_df_sorted = matched_df.sort_values('error', na_position='last')
print(matched_df_sorted.head(20)[['name', 'codata_value', 'emergent_value', 'n', 'beta', 'error', 'rel_error', 'codata_uncertainty', 'bad_data', 'bad_data_reason']].to_string(index=False))
plt.figure(figsize=(10, 5))
plt.hist(df_results_sorted['error'].dropna(), bins=50, color='skyblue', edgecolor='black')
plt.title('Histogram of Absolute Errors in Symbolic Fit')
plt.xlabel('Absolute Error')
plt.ylabel('Count')
plt.grid(True)
plt.tight_layout()
plt.savefig('histogram_errors.png')
plt.close()
plt.figure(figsize=(10, 5))
plt.scatter(df_results_sorted['n'], df_results_sorted['error'], alpha=0.5, s=15, c='orange', edgecolors='black')
plt.title('Absolute Error vs Symbolic Dimension n (Fitted)')
plt.xlabel('n')
plt.ylabel('Absolute Error')
plt.grid(True)
plt.tight_layout()
plt.savefig('scatter_n_error.png')
plt.close()
plt.figure(figsize=(10, 5))
worst_fits = df_results_sorted.tail(20)
plt.bar(worst_fits['name'], worst_fits['error'], color='salmon', edgecolor='black')
plt.xticks(rotation=90)
plt.title('Absolute Errors for Top 20 Worst Symbolic Fits')
plt.xlabel('Constant Name')
plt.ylabel('Absolute Error')
plt.grid(True)
plt.tight_layout()
plt.savefig('bar_worst_fits.png')
plt.close()
plt.figure(figsize=(10, 5))
plt.scatter(matched_df_sorted['codata_value'], matched_df_sorted['emergent_value'], alpha=0.5, s=15, c='purple', edgecolors='black')
plt.plot([matched_df_sorted['codata_value'].min(), matched_df_sorted['codata_value'].max()],
[matched_df_sorted['codata_value'].min(), matched_df_sorted['codata_value'].max()], 'k--')
plt.title('Emergent Constants vs. CODATA Values')
plt.xlabel('CODATA Value')
plt.ylabel('Emergent Value')
plt.grid(True)
plt.tight_layout()
plt.savefig('scatter_codata_emergent.png')
plt.close()
progress.update(1)
logging.info(f"Total runtime: {time.time() - start_time:.2f} seconds")
if __name__ == "__main__":
try:
main()
except KeyboardInterrupt:
signal_handler(None, None)
gpu2.py
import numpy as np
import pandas as pd
from scipy.optimize import minimize
from tqdm import tqdm
from joblib import Parallel, delayed
import logging
import time
import matplotlib.pyplot as plt
import os
import signal
import sys
# Set up logging
logging.basicConfig(filename='symbolic_fit_optimized.log', level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', force=True)
# Extended primes list (up to 1000)
PRIMES = [
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71,
73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151,
157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233,
239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317,
331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419,
421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503,
509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607,
613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701,
709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811,
821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911,
919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997
]
phi = (1 + np.sqrt(5)) / 2
fib_cache = {}
def fib_real(n):
if n in fib_cache:
return fib_cache[n]
if n > 100:
return 0.0
term1 = phi**n / np.sqrt(5)
term2 = ((1/phi)**n) * np.cos(np.pi * n)
result = term1 - term2
fib_cache[n] = result
return result
def D(n, beta, r=1.0, k=1.0, Omega=1.0, base=2, scale=1.0):
try:
n = np.asarray(n)
beta = np.asarray(beta)
r = np.asarray(r)
k = np.asarray(k)
Omega = np.asarray(Omega)
base = np.asarray(base)
scale = np.asarray(scale)
Fn_beta = fib_real(n + beta)
idx = int(np.floor(n + beta) + len(PRIMES)) % len(PRIMES)
Pn_beta = PRIMES[idx]
log_dyadic = (n + beta) * np.log(np.maximum(base, 1e-10))
log_dyadic = np.where((log_dyadic > 700) | (log_dyadic < -700), np.nan, log_dyadic)
log_val = np.log(np.maximum(scale, 1e-30)) + np.log(phi) + np.log(np.maximum(np.abs(Fn_beta), 1e-30)) + log_dyadic + np.log(np.maximum(Omega, 1e-30))
log_val = np.where(np.abs(n - 1000) < 1e-3, log_val, log_val + np.log(np.clip(np.log(np.maximum(n, 1e-10)) / np.log(1000), 1e-10, np.inf)))
val = np.where(np.isfinite(log_val), np.exp(log_val) * np.sign(Fn_beta), np.nan)
result = np.sqrt(np.maximum(np.abs(val), 1e-30)) * (r ** k) * np.sign(val)
return result
except Exception as e:
logging.error(f"D failed: n={n}, beta={beta}, r={r}, k={k}, Omega={Omega}, base={base}, scale={scale}, error={e}")
return None
def invert_D(value, r=1.0, k=1.0, Omega=1.0, base=2, scale=1.0, max_n=500):
candidates = []
log_val = np.log10(max(abs(value), 1e-30))
max_n = min(1000, max(200, int(100 * abs(log_val) + 100)))
n_values = np.linspace(0, max_n, 10) # Reduced resolution
scale_factors = np.logspace(max(log_val - 5, -20), min(log_val + 5, 20), num=2) # Tighter range
r_values = [0.5, 1.0, 2.0]
k_values = [0.5, 1.0, 2.0]
try:
r_grid, k_grid = np.meshgrid(r_values, k_values)
r_grid = r_grid.ravel()
k_grid = k_grid.ravel()
for n in n_values:
for beta in np.linspace(0, 1, 10):
for dynamic_scale in scale_factors:
vals = D(n, beta, r_grid, k_grid, Omega, base, scale * dynamic_scale)
if vals is None or not np.all(np.isfinite(vals)):
continue
diffs = np.abs(vals - abs(value))
rel_diffs = diffs / max(abs(value), 1e-30)
valid_idx = rel_diffs < 0.5
if np.any(rel_diffs < 0.01):
idx = np.argmin(rel_diffs)
return n, beta, dynamic_scale, diffs[idx], r_grid[idx], k_grid[idx]
if np.any(valid_idx):
candidates.extend([(diffs[i], n, beta, dynamic_scale, r_grid[i], k_grid[i]) for i in np.where(valid_idx)[0]])
if not candidates:
logging.error(f"invert_D: No valid candidates for value {value}")
return None, None, None, None, None, None
candidates = sorted(candidates, key=lambda x: x[0])[:5]
valid_vals = [D(n, beta, r, k, Omega, base, scale * s)
for _, n, beta, s, r, k in candidates if D(n, beta, r, k, Omega, base, scale * s) is not None]
if not valid_vals:
return None, None, None, None, None, None
emergent_uncertainty = np.std(valid_vals) if len(valid_vals) > 1 else abs(valid_vals[0]) * 0.01
if not np.isfinite(emergent_uncertainty):
logging.error(f"invert_D: Non-finite emergent uncertainty for value {value}")
return None, None, None, None, None, None
best = candidates[0]
return best[1], best[2], best[3], emergent_uncertainty, best[4], best[5]
except Exception as e:
logging.error(f"invert_D failed for value {value}: {e}")
return None, None, None, None, None, None
def parse_categorized_codata(filename):
try:
df = pd.read_csv(filename, sep='\t', header=0,
names=['name', 'value', 'uncertainty', 'unit', 'category'],
dtype={'name': str, 'value': float, 'uncertainty': float, 'unit': str, 'category': str},
na_values=['exact'])
df['uncertainty'] = df['uncertainty'].fillna(0.0)
required_columns = ['name', 'value', 'uncertainty', 'unit']
if not all(col in df.columns for col in required_columns):
missing = [col for col in required_columns if col not in df.columns]
raise ValueError(f"Missing required columns in {filename}: {missing}")
logging.info(f"Successfully parsed {len(df)} constants from {filename}")
return df
except FileNotFoundError:
logging.error(f"Input file {filename} not found")
raise
except Exception as e:
logging.error(f"Error parsing {filename}: {e}")
raise
def generate_emergent_constants(n_max=1000, beta_steps=10, r_values=[0.5, 1.0, 2.0], k_values=[0.5, 1.0, 2.0], Omega=1.0, base=2, scale=1.0):
candidates = []
n_values = np.linspace(0, n_max, 100)
beta_values = np.linspace(0, 1, beta_steps)
for n in tqdm(n_values, desc="Generating emergent constants"):
for beta in beta_values:
for r in r_values:
for k in k_values:
val = D(n, beta, r, k, Omega, base, scale)
if val is not None and np.isfinite(val):
candidates.append({
'n': n, 'beta': beta, 'value': val, 'r': r, 'k': k, 'scale': scale
})
return pd.DataFrame(candidates)
def match_to_codata(df_emergent, df_codata, tolerance=0.01, batch_size=100):
matches = []
output_file = "emergent_constants.txt"
with open(output_file, 'w', encoding='utf-8') as f:
pd.DataFrame(columns=['name', 'codata_value', 'emergent_value', 'n', 'beta', 'r', 'k', 'scale', 'error', 'rel_error', 'codata_uncertainty', 'bad_data', 'bad_data_reason']).to_csv(f, sep="\t", index=False)
for start in range(0, len(df_codata), batch_size):
batch = df_codata.iloc[start:start + batch_size]
for _, codata_row in tqdm(batch.iterrows(), total=len(batch), desc=f"Matching constants batch {start//batch_size + 1}"):
value = codata_row['value']
mask = abs(df_emergent['value'] - value) / max(abs(value), 1e-30) < tolerance
matched = df_emergent[mask]
for _, emergent_row in matched.iterrows():
error = abs(emergent_row['value'] - value)
rel_error = error / max(abs(value), 1e-30)
matches.append({
'name': codata_row['name'],
'codata_value': value,
'emergent_value': emergent_row['value'],
'n': emergent_row['n'],
'beta': emergent_row['beta'],
'r': emergent_row['r'],
'k': emergent_row['k'],
'scale': emergent_row['scale'],
'error': error,
'rel_error': rel_error,
'codata_uncertainty': codata_row['uncertainty'],
'bad_data': rel_error > 0.5 or (codata_row['uncertainty'] is not None and abs(codata_row['uncertainty'] - error) > 10 * codata_row['uncertainty']),
'bad_data_reason': f"High rel_error ({rel_error:.2e})" if rel_error > 0.5 else f"Uncertainty deviation ({codata_row['uncertainty']:.2e} vs. {error:.2e})" if (codata_row['uncertainty'] is not None and abs(codata_row['uncertainty'] - error) > 10 * codata_row['uncertainty']) else ""
})
try:
with open(output_file, 'a', encoding='utf-8') as f:
pd.DataFrame(matches).to_csv(f, sep="\t", index=False, header=False, lineterminator='\n')
f.flush()
matches = []
except Exception as e:
logging.error(f"Failed to save batch {start//batch_size + 1} to {output_file}: {e}")
return pd.DataFrame(pd.read_csv(output_file, sep='\t'))
def check_physical_consistency(df_results):
bad_data = []
relations = [
('Planck constant', 'reduced Planck constant', lambda x, y: abs(x['scale'] / y['scale'] - 2 * np.pi), 0.1, 'scale ratio vs. 2π'),
('proton mass', 'proton-electron mass ratio', lambda x, y: abs(x['n'] - y['n'] - np.log10(1836)), 0.5, 'n difference vs. log(proton-electron ratio)'),
('Fermi coupling constant', 'weak mixing angle', lambda x, y: abs(x['scale'] - y['scale'] / np.sqrt(2)), 0.1, 'scale vs. sin²θ_W/√2'),
('tau energy equivalent', 'tau mass energy equivalent in MeV', lambda x, y: abs(x['codata_value'] - y['codata_value']), 0.01, 'value consistency'),
('proton mass', 'electron mass', 'proton-electron mass ratio',
lambda x, y, z: abs(z['n'] - abs(x['n'] - y['n'])), 10.0, 'n inconsistency for mass ratio'),
('fine-structure constant', 'elementary charge', 'Planck constant',
lambda x, y, z: abs(x['codata_value'] - y['codata_value']**2 / (4 * np.pi * 8.854187817e-12 * z['codata_value'] * 299792458)), 0.01, 'fine-structure vs. e²/(4πε₀hc)'),
('Bohr magneton', 'elementary charge', 'Planck constant',
lambda x, y, z: abs(x['codata_value'] - y['codata_value'] * z['codata_value'] / (2 * 9.1093837e-31)), 0.01, 'Bohr magneton vs. eh/(2m_e)')
]
for relation in relations:
try:
if len(relation) == 5:
name1, name2, check_func, threshold, reason = relation
if name1 in df_results['name'].values and name2 in df_results['name'].values:
row1 = df_results[df_results['name'] == name1].iloc[0]
row2 = df_results[df_results['name'] == name2].iloc[0]
if check_func(row1, row2) > threshold:
bad_data.append((name1, f"Physical inconsistency: {reason}"))
bad_data.append((name2, f"Physical inconsistency: {reason}"))
elif len(relation) == 6:
name1, name2, name3, check_func, threshold, reason = relation
if all(name in df_results['name'].values for name in [name1, name2, name3]):
row1 = df_results[df_results['name'] == name1].iloc[0]
row2 = df_results[df_results['name'] == name2].iloc[0]
row3 = df_results[df_results['name'] == name3].iloc[0]
if check_func(row1, row2, row3) > threshold:
bad_data.append((name3, f"Physical inconsistency: {reason}"))
except Exception as e:
logging.warning(f"Physical consistency check failed for {relation}: {e}")
continue
return bad_data
def total_error(params, df_subset):
r, k, Omega, base, scale = params
df_results = symbolic_fit_all_constants(df_subset, base=base, Omega=Omega, r=r, k=k, scale=scale)
if df_results.empty:
return np.inf
valid_errors = df_results['error'].dropna()
return valid_errors.mean() if not valid_errors.empty else np.inf
def process_constant(row, r, k, Omega, base, scale):
try:
name, value, uncertainty, unit = row['name'], row['value'], row['uncertainty'], row['unit']
abs_value = abs(value)
sign = np.sign(value)
result = invert_D(abs_value, r=r, k=k, Omega=Omega, base=base, scale=scale)
if result[0] is None:
logging.warning(f"No valid fit for {name}")
return {
'name': name, 'codata_value': value, 'unit': unit, 'n': None, 'beta': None, 'emergent_value': None,
'error': None, 'rel_error': None, 'codata_uncertainty': uncertainty, 'emergent_uncertainty': None,
'scale': None, 'bad_data': True, 'bad_data_reason': 'No valid fit found', 'r': None, 'k': None
}
n, beta, dynamic_scale, emergent_uncertainty, r_local, k_local = result
approx = D(n, beta, r_local, k_local, Omega, base, scale * dynamic_scale)
if approx is None:
logging.warning(f"D returned None for {name}")
return {
'name': name, 'codata_value': value, 'unit': unit, 'n': None, 'beta': None, 'emergent_value': None,
'error': None, 'rel_error': None, 'codata_uncertainty': uncertainty, 'emergent_uncertainty': None,
'scale': None, 'bad_data': True, 'bad_data_reason': 'D function returned None', 'r': None, 'k': None
}
approx *= sign
error = abs(approx - value)
rel_error = error / max(abs(value), 1e-30) if abs(value) > 0 else np.inf
bad_data = False
bad_data_reason = ""
if rel_error > 0.5:
bad_data = True
bad_data_reason += f"High relative error ({rel_error:.2e} > 0.5); "
if emergent_uncertainty is not None and uncertainty is not None:
if emergent_uncertainty > uncertainty * 20 or emergent_uncertainty < uncertainty / 20:
bad_data = True
bad_data_reason += f"Uncertainty deviates from emergent ({emergent_uncertainty:.2e} vs. {uncertainty:.2e}); "
return {
'name': name, 'codata_value': value, 'unit': unit, 'n': n, 'beta': beta, 'emergent_value': approx,
'error': error, 'rel_error': rel_error, 'codata_uncertainty': uncertainty,
'emergent_uncertainty': emergent_uncertainty, 'scale': scale * dynamic_scale,
'bad_data': bad_data, 'bad_data_reason': bad_data_reason, 'r': r_local, 'k': k_local
}
except Exception as e:
logging.error(f"process_constant failed for {row['name']}: {e}")
return {
'name': row['name'], 'codata_value': row['value'], 'unit': row['unit'], 'n': None, 'beta': None,
'emergent_value': None, 'error': None, 'rel_error': None, 'codata_uncertainty': row['uncertainty'],
'emergent_uncertainty': None, 'scale': None, 'bad_data': True, 'bad_data_reason': f"Processing error: {str(e)}",
'r': None, 'k': None
}
def symbolic_fit_all_constants(df, base=2, Omega=1.0, r=1.0, k=1.0, scale=1.0, batch_size=100):
logging.info("Starting symbolic fit for all constants...")
results = []
output_file = "symbolic_fit_results_emergent_fixed.txt"
with open(output_file, 'w', encoding='utf-8') as f:
pd.DataFrame(columns=['name', 'codata_value', 'unit', 'n', 'beta', 'emergent_value', 'error', 'rel_error',
'codata_uncertainty', 'emergent_uncertainty', 'scale', 'bad_data', 'bad_data_reason', 'r', 'k']).to_csv(f, sep="\t", index=False)
for start in range(0, len(df), batch_size):
batch = df.iloc[start:start + batch_size]
try:
batch_results = Parallel(n_jobs=12, timeout=120, backend='loky', maxtasksperchild=20)(
delayed(process_constant)(row, r, k, Omega, base, scale)
for row in tqdm(batch.to_dict('records'), total=len(batch), desc=f"Fitting constants batch {start//batch_size + 1}")
)
batch_results = [r for r in batch_results if r is not None]
results.extend(batch_results)
try:
with open(output_file, 'a', encoding='utf-8') as f:
pd.DataFrame(batch_results).to_csv(f, sep="\t", index=False, header=False, lineterminator='\n')
f.flush()
except Exception as e:
logging.error(f"Failed to save batch {start//batch_size + 1} to {output_file}: {e}")
except Exception as e:
logging.error(f"Parallel processing failed for batch {start//batch_size + 1}: {e}")
continue
df_results = pd.DataFrame(results)
if not df_results.empty:
df_results['bad_data'] = df_results.get('bad_data', False)
df_results['bad_data_reason'] = df_results.get('bad_data_reason', '')
for name in df_results['name'].unique():
mask = df_results['name'] == name
if df_results.loc[mask, 'codata_uncertainty'].notnull().any():
uncertainties = df_results.loc[mask, 'codata_uncertainty'].dropna()
if not uncertainties.empty:
Q1, Q3 = np.percentile(uncertainties, [25, 75])
IQR = Q3 - Q1
outlier_mask = (uncertainties < Q1 - 1.5 * IQR) | (uncertainties > Q3 + 1.5 * IQR)
if outlier_mask.any():
df_results.loc[mask & df_results['codata_uncertainty'].isin(uncertainties[outlier_mask]), 'bad_data'] = True
df_results.loc[mask & df_results['codata_uncertainty'].isin(uncertainties[outlier_mask]), 'bad_data_reason'] += 'Uncertainty outlier; '
high_rel_error_mask = df_results['rel_error'] > 0.5
df_results.loc[high_rel_error_mask, 'bad_data'] = True
df_results.loc[high_rel_error_mask, 'bad_data_reason'] += df_results.loc[high_rel_error_mask, 'rel_error'].apply(lambda x: f"High relative error ({x:.2e} > 0.5); ")
high_uncertainty_mask = (df_results['emergent_uncertainty'].notnull()) & (
(df_results['codata_uncertainty'] > 20 * df_results['emergent_uncertainty']) |
(df_results['codata_uncertainty'] < 0.05 * df_results['emergent_uncertainty'])
)
df_results.loc[high_uncertainty_mask, 'bad_data'] = True
df_results.loc[high_uncertainty_mask, 'bad_data_reason'] += df_results.loc[high_uncertainty_mask].apply(
lambda row: f"Uncertainty deviates from emergent ({row['codata_uncertainty']:.2e} vs. {row['emergent_uncertainty']:.2e}); ", axis=1)
bad_data = check_physical_consistency(df_results)
for name, reason in bad_data:
df_results.loc[df_results['name'] == name, 'bad_data'] = True
df_results.loc[df_results['name'] == name, 'bad_data_reason'] += reason + '; '
logging.info("Symbolic fit completed.")
return df_results
def select_worst_names(df, n_select=10):
categories = df['category'].unique()
n_per_category = max(1, n_select // len(categories))
selected = []
for category in categories:
cat_df = df[df['category'] == category]
if len(cat_df) > 0:
n_to_select = min(n_per_category, len(cat_df))
selected.extend(np.random.choice(cat_df['name'], size=n_to_select, replace=False))
if len(selected) < n_select:
remaining = df[~df['name'].isin(selected)]
if len(remaining) > 0:
selected.extend(np.random.choice(remaining['name'], size=n_select - len(selected), replace=False))
return selected[:n_select]
def signal_handler(sig, frame):
print("\nKeyboardInterrupt detected. Saving partial results...")
logging.info("KeyboardInterrupt detected. Exiting gracefully.")
for output_file in ["emergent_constants.txt", "symbolic_fit_results_emergent_fixed.txt"]:
try:
with open(output_file, 'a', encoding='utf-8') as f:
f.flush()
except Exception as e:
logging.error(f"Failed to flush {output_file} on interrupt: {e}")
sys.exit(0)
def main():
signal.signal(signal.SIGINT, signal_handler)
start_time = time.time()
stages = ['Parsing data', 'Generating emergent constants', 'Optimizing parameters', 'Fitting all constants', 'Generating plots']
progress = tqdm(stages, desc="Overall progress")
# Stage 1: Parse data
input_file = "categorized_allascii.txt"
if not os.path.exists(input_file):
raise FileNotFoundError(f"{input_file} not found in the current directory")
df = parse_categorized_codata(input_file)
logging.info(f"Parsed {len(df)} constants")
progress.update(1)
# Stage 2: Generate emergent constants
emergent_df = generate_emergent_constants(n_max=1000, beta_steps=10)
matched_df = match_to_codata(emergent_df, df, tolerance=0.01, batch_size=100)
logging.info("Saved emergent constants to emergent_constants.txt")
progress.update(1)
# Stage 3: Optimize parameters
worst_names = select_worst_names(df, n_select=10)
print(f"Selected constants for optimization: {worst_names}")
subset_df = df[df['name'].isin(worst_names)]
if subset_df.empty:
subset_df = df.head(50)
init_params = [0.5, 0.5, 0.5, 2.0, 0.1]
bounds = [(1e-10, 100), (1e-10, 100), (1e-10, 100), (1.5, 20), (1e-10, 1000)]
try:
res = minimize(total_error, init_params, args=(subset_df,), bounds=bounds, method='SLSQP', options={'maxiter': 50, 'disp': True})
if not res.success:
logging.warning(f"Optimization failed: {res.message}")
r_opt, k_opt, Omega_opt, base_opt, scale_opt = init_params
else:
r_opt, k_opt, Omega_opt, base_opt, scale_opt = res.x
print(f"Optimization complete. Found parameters:\nr = {r_opt:.6f}, k = {k_opt:.6f}, Omega = {Omega_opt:.6f}, base = {base_opt:.6f}, scale = {scale_opt:.6f}")
except Exception as e:
logging.error(f"Optimization failed: {e}")
r_opt, k_opt, Omega_opt, base_opt, scale_opt = init_params
print(f"Optimization failed: {e}. Using default parameters.")
progress.update(1)
# Stage 4: Run final fit
df_results = symbolic_fit_all_constants(df, base=base_opt, Omega=Omega_opt, r=r_opt, k=k_opt, scale=scale_opt, batch_size=100)
if not df_results.empty:
with open("symbolic_fit_results.txt", 'w', encoding='utf-8') as f:
df_results.to_csv(f, sep="\t", index=False)
f.flush()
logging.info(f"Saved final results to symbolic_fit_results.txt")
else:
logging.error("No results to save")
progress.update(1)
# Stage 5: Generate plots
df_results_sorted = df_results.sort_values("error", na_position='last')
print("\nTop 20 best symbolic fits:")
print(df_results_sorted.head(20)[['name', 'codata_value', 'unit', 'n', 'beta', 'emergent_value', 'error', 'codata_uncertainty', 'scale', 'bad_data', 'bad_data_reason']].to_string(index=False))
print("\nTop 20 worst symbolic fits:")
print(df_results_sorted.tail(20)[['name', 'codata_value', 'unit', 'n', 'beta', 'emergent_value', 'error', 'codata_uncertainty', 'scale', 'bad_data', 'bad_data_reason']].to_string(index=False))
print("\nPotentially bad data constants summary (possible cheated data):")
bad_data_df = df_results[df_results['bad_data'] == True][['name', 'codata_value', 'error', 'rel_error', 'codata_uncertainty', 'emergent_uncertainty', 'bad_data_reason']]
bad_data_df = bad_data_df.sort_values('rel_error', ascending=False, na_position='last')
print(bad_data_df.to_string(index=False))
print("\nTop 20 emergent constants matches:")
matched_df_sorted = matched_df.sort_values('error', na_position='last')
print(matched_df_sorted.head(20)[['name', 'codata_value', 'emergent_value', 'n', 'beta', 'error', 'rel_error', 'codata_uncertainty', 'bad_data', 'bad_data_reason']].to_string(index=False))
plt.figure(figsize=(10, 5))
plt.hist(df_results_sorted['error'].dropna(), bins=50, color='skyblue', edgecolor='black')
plt.title('Histogram of Absolute Errors in Symbolic Fit')
plt.xlabel('Absolute Error')
plt.ylabel('Count')
plt.grid(True)
plt.tight_layout()
plt.savefig('histogram_errors.png')
plt.close()
plt.figure(figsize=(10, 5))
plt.scatter(df_results_sorted['n'], df_results_sorted['error'], alpha=0.5, s=15, c='orange', edgecolors='black')
plt.title('Absolute Error vs Symbolic Dimension n (Fitted)')
plt.xlabel('n')
plt.ylabel('Absolute Error')
plt.grid(True)
plt.tight_layout()
plt.savefig('scatter_n_error.png')
plt.close()
plt.figure(figsize=(10, 5))
worst_fits = df_results_sorted.tail(20)
plt.bar(worst_fits['name'], worst_fits['error'], color='salmon', edgecolor='black')
plt.xticks(rotation=90)
plt.title('Absolute Errors for Top 20 Worst Symbolic Fits')
plt.xlabel('Constant Name')
plt.ylabel('Absolute Error')
plt.grid(True)
plt.tight_layout()
plt.savefig('bar_worst_fits.png')
plt.close()
plt.figure(figsize=(10, 5))
plt.scatter(matched_df_sorted['codata_value'], matched_df_sorted['emergent_value'], alpha=0.5, s=15, c='purple', edgecolors='black')
plt.plot([matched_df_sorted['codata_value'].min(), matched_df_sorted['codata_value'].max()],
[matched_df_sorted['codata_value'].min(), matched_df_sorted['codata_value'].max()], 'k--')
plt.title('Emergent Constants vs. CODATA Values')
plt.xlabel('CODATA Value')
plt.ylabel('Emergent Value')
plt.grid(True)
plt.tight_layout()
plt.savefig('scatter_codata_emergent.png')
plt.close()
progress.update(1)
logging.info(f"Total runtime: {time.time() - start_time:.2f} seconds")
if __name__ == "__main__":
try:
main()
except KeyboardInterrupt:
signal_handler(None, None)
gpu1_optimized2.py
import numpy as np
import pandas as pd
import re
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from tqdm import tqdm
from joblib import Parallel, delayed
# Extended primes list (up to 1000)
PRIMES = [
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71,
73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151,
157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233,
239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317,
331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419,
421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503,
509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607,
613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701,
709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811,
821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911,
919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997
]
phi = (1 + np.sqrt(5)) / 2
fib_cache = {}
def fib_real(n):
if n in fib_cache:
return fib_cache[n]
from math import cos, pi, sqrt
phi_inv = 1 / phi
if n > 100:
return 0.0
term1 = phi**n / sqrt(5)
term2 = (phi_inv**n) * cos(pi * n)
result = term1 - term2
fib_cache[n] = result
return result
def D(n, beta, r=1.0, k=1.0, Omega=1.0, base=2, scale=1.0):
Fn_beta = fib_real(n + beta)
idx = int(np.floor(n + beta) + len(PRIMES)) % len(PRIMES)
Pn_beta = PRIMES[idx]
dyadic = base ** (n + beta)
val = scale * phi * Fn_beta * dyadic * Pn_beta * Omega
val = np.maximum(val, 1e-30)
return np.sqrt(val) * (r ** k)
def invert_D(value, r=1.0, k=1.0, Omega=1.0, base=2, scale=1.0, max_n=500, steps=500):
candidates = []
log_val = np.log10(max(abs(value), 1e-30))
scale_factors = np.logspace(log_val - 2, log_val + 2, num=10)
# Adjust max_n based on constant magnitude
max_n = min(max_n, max(50, int(10 * log_val)))
for n in np.linspace(0, max_n, steps):
for beta in np.linspace(0, 1, 10):
for dynamic_scale in scale_factors:
val = D(n, beta, r, k, Omega, base, scale * dynamic_scale)
diff = abs(val - value)
candidates.append((diff, n, beta, dynamic_scale))
best = min(candidates, key=lambda x: x[0])
return best[1], best[2], best[3]
def parse_codata_ascii(filename):
constants = []
pattern = re.compile(r"^\s*(.*?)\s{2,}([0-9Ee\+\-\.]+)\s+([0-9Ee\+\-\.]+|exact)\s+(\S+)")
with open(filename, "r") as f:
for line in f:
if line.startswith("Quantity") or line.strip() == "" or line.startswith("-"):
continue
m = pattern.match(line)
if m:
name, value_str, uncert_str, unit = m.groups()
try:
value = float(value_str.replace("e", "E"))
uncertainty = None if uncert_str == "exact" else float(uncert_str.replace("e", "E"))
constants.append({
"name": name.strip(),
"value": value,
"uncertainty": uncertainty,
"unit": unit.strip()
})
except:
continue
return pd.DataFrame(constants)
def fit_single_constant(row, r, k, Omega, base, scale, max_n, steps):
val = row['value']
if val <= 0 or val > 1e50:
return None
try:
n, beta, dynamic_scale = invert_D(val, r, k, Omega, base, scale, max_n, steps)
approx = D(n, beta, r, k, Omega, base, scale * dynamic_scale)
error = abs(val - approx)
return {
"name": row['name'],
"value": val,
"unit": row['unit'],
"n": n,
"beta": beta,
"approx": approx,
"error": error,
"uncertainty": row['uncertainty'],
"scale": dynamic_scale
}
except Exception as e:
print(f"Failed inversion for {row['name']}: {e}")
return None
def symbolic_fit_all_constants(df, r=1.0, k=1.0, Omega=1.0, base=2, scale=1.0, max_n=500, steps=500):
results = Parallel(n_jobs=20)(
delayed(fit_single_constant)(row, r, k, Omega, base, scale, max_n, steps)
for _, row in df.iterrows()
)
return pd.DataFrame([r for r in results if r is not None])
def total_error(params, df):
r, k, Omega, base, scale = params
df_fit = symbolic_fit_all_constants(df, r=r, k=k, Omega=Omega, base=base, scale=scale, max_n=500, steps=500)
threshold = np.percentile(df_fit['error'], 95)
filtered = df_fit[df_fit['error'] <= threshold]
rel_err = ((filtered['value'] - filtered['approx']) / filtered['value'])**2
return rel_err.sum()
if __name__ == "__main__":
print("Parsing CODATA constants from allascii.txt...")
codata_df = parse_codata_ascii("allascii.txt")
print(f"Parsed {len(codata_df)} constants.")
# Use a subset for optimization
subset_df = codata_df.head(20)
init_params = [1.0, 1.0, 1.0, 2.0, 1.0]
bounds = [(1e-5, 10), (1e-5, 10), (1e-5, 10), (1.5, 10), (1e-5, 100)]
print("Optimizing symbolic model parameters...")
res = minimize(total_error, init_params, args=(subset_df,), bounds=bounds, method='L-BFGS-B', options={'maxiter': 100})
r_opt, k_opt, Omega_opt, base_opt, scale_opt = res.x
print(f"Optimization complete. Found parameters:\nr = {r_opt:.6f}, k = {k_opt:.6f}, Omega = {Omega_opt:.6f}, base = {base_opt:.6f}, scale = {scale_opt:.6f}")
print("Fitting symbolic dimensions to all constants...")
fitted_df = symbolic_fit_all_constants(codata_df, r=r_opt, k=k_opt, Omega=Omega_opt, base=base_opt, scale=scale_opt, max_n=500, steps=500)
fitted_df_sorted = fitted_df.sort_values("error")
print("\nTop 20 best symbolic fits:")
print(fitted_df_sorted.head(20).to_string(index=False))
print("\nTop 20 worst symbolic fits:")
print(fitted_df_sorted.tail(20).to_string(index=False))
fitted_df_sorted.to_csv("symbolic_fit_results.txt", sep="\t", index=False)
plt.figure(figsize=(10, 5))
plt.hist(fitted_df_sorted['error'], bins=50, color='skyblue', edgecolor='black')
plt.title('Histogram of Absolute Errors in Symbolic Fit')
plt.xlabel('Absolute Error')
plt.ylabel('Count')
plt.grid(True)
plt.tight_layout()
plt.show()
plt.figure(figsize=(10, 5))
plt.scatter(fitted_df_sorted['n'], fitted_df_sorted['error'], alpha=0.5, s=15, c='orange', edgecolors='black')
plt.title('Absolute Error vs Symbolic Dimension n')
plt.xlabel('n')
plt.ylabel('Absolute Error')
plt.grid(True)
plt.tight_layout()
plt.show()
fudge10_fixed.py
import numpy as np
import pandas as pd
import re
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from tqdm import tqdm
from joblib import Parallel, delayed
# Extended primes list (up to 1000)
PRIMES = [
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71,
73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151,
157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233,
239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317,
331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419,
421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503,
509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607,
613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701,
709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811,
821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911,
919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997
]
phi = (1 + np.sqrt(5)) / 2
fib_cache = {}
def fib_real(n):
if n in fib_cache:
return fib_cache[n]
from math import cos, pi, sqrt
phi_inv = 1 / phi
if n > 100:
return 0.0
term1 = phi**n / sqrt(5)
term2 = (phi_inv**n) * cos(pi * n)
result = term1 - term2
fib_cache[n] = result
return result
def D(n, beta, r=1.0, k=1.0, Omega=1.0, base=2, scale=1.0):
Fn_beta = fib_real(n + beta)
idx = int(np.floor(n + beta) + len(PRIMES)) % len(PRIMES)
Pn_beta = PRIMES[idx]
dyadic = base ** (n + beta)
val = scale * phi * Fn_beta * dyadic * Pn_beta * Omega
val = np.maximum(val, 1e-30)
return np.sqrt(val) * (r ** k)
def invert_D(value, r=1.0, k=1.0, Omega=1.0, base=2, scale=1.0, max_n=500, steps=500):
candidates = []
log_val = np.log10(max(abs(value), 1e-30))
scale_factors = np.logspace(log_val - 2, log_val + 2, num=10)
# Adjust max_n based on constant magnitude
max_n = min(max_n, max(50, int(10 * log_val)))
for n in np.linspace(0, max_n, steps):
for beta in np.linspace(0, 1, 10):
for dynamic_scale in scale_factors:
val = D(n, beta, r, k, Omega, base, scale * dynamic_scale)
diff = abs(val - value)
candidates.append((diff, n, beta, dynamic_scale))
best = min(candidates, key=lambda x: x[0])
return best[1], best[2], best[3]
def parse_codata_ascii(filename):
constants = []
pattern = re.compile(r"^\s*(.*?)\s{2,}([0-9Ee\+\-\.]+)\s+([0-9Ee\+\-\.]+|exact)\s+(\S+)")
with open(filename, "r") as f:
for line in f:
if line.startswith("Quantity") or line.strip() == "" or line.startswith("-"):
continue
m = pattern.match(line)
if m:
name, value_str, uncert_str, unit = m.groups()
try:
value = float(value_str.replace("e", "E"))
uncertainty = None if uncert_str == "exact" else float(uncert_str.replace("e", "E"))
constants.append({
"name": name.strip(),
"value": value,
"uncertainty": uncertainty,
"unit": unit.strip()
})
except:
continue
return pd.DataFrame(constants)
def fit_single_constant(row, r, k, Omega, base, scale, max_n, steps):
val = row['value']
if val <= 0 or val > 1e50:
return None
try:
n, beta, dynamic_scale = invert_D(val, r, k, Omega, base, scale, max_n, steps)
approx = D(n, beta, r, k, Omega, base, scale * dynamic_scale)
error = abs(val - approx)
return {
"name": row['name'],
"value": val,
"unit": row['unit'],
"n": n,
"beta": beta,
"approx": approx,
"error": error,
"uncertainty": row['uncertainty'],
"scale": dynamic_scale
}
except Exception as e:
print(f"Failed inversion for {row['name']}: {e}")
return None
def symbolic_fit_all_constants(df, r=1.0, k=1.0, Omega=1.0, base=2, scale=1.0, max_n=500, steps=500):
results = Parallel(n_jobs=20)(
delayed(fit_single_constant)(row, r, k, Omega, base, scale, max_n, steps)
for _, row in df.iterrows()
)
return pd.DataFrame([r for r in results if r is not None])
def total_error(params, df):
r, k, Omega, base, scale = params
df_fit = symbolic_fit_all_constants(df, r=r, k=k, Omega=Omega, base=base, scale=scale, max_n=500, steps=500)
threshold = np.percentile(df_fit['error'], 95)
filtered = df_fit[df_fit['error'] <= threshold]
rel_err = ((filtered['value'] - filtered['approx']) / filtered['value'])**2
return rel_err.sum()
if __name__ == "__main__":
print("Parsing CODATA constants from allascii.txt...")
codata_df = parse_codata_ascii("allascii.txt")
print(f"Parsed {len(codata_df)} constants.")
# Use a subset for optimization
subset_df = codata_df.head(20)
init_params = [1.0, 1.0, 1.0, 2.0, 1.0]
bounds = [(1e-5, 10), (1e-5, 10), (1e-5, 10), (1.5, 10), (1e-5, 100)]
print("Optimizing symbolic model parameters...")
res = minimize(total_error, init_params, args=(subset_df,), bounds=bounds, method='L-BFGS-B', options={'maxiter': 100})
r_opt, k_opt, Omega_opt, base_opt, scale_opt = res.x
print(f"Optimization complete. Found parameters:\nr = {r_opt:.6f}, k = {k_opt:.6f}, Omega = {Omega_opt:.6f}, base = {base_opt:.6f}, scale = {scale_opt:.6f}")
print("Fitting symbolic dimensions to all constants...")
fitted_df = symbolic_fit_all_constants(codata_df, r=r_opt, k=k_opt, Omega=Omega_opt, base=base_opt, scale=scale_opt, max_n=500, steps=500)
fitted_df_sorted = fitted_df.sort_values("error")
print("\nTop 20 best symbolic fits:")
print(fitted_df_sorted.head(20).to_string(index=False))
print("\nTop 20 worst symbolic fits:")
print(fitted_df_sorted.tail(20).to_string(index=False))
fitted_df_sorted.to_csv("symbolic_fit_results.txt", sep="\t", index=False)
plt.figure(figsize=(10, 5))
plt.hist(fitted_df_sorted['error'], bins=50, color='skyblue', edgecolor='black')
plt.title('Histogram of Absolute Errors in Symbolic Fit')
plt.xlabel('Absolute Error')
plt.ylabel('Count')
plt.grid(True)
plt.tight_layout()
plt.show()
plt.figure(figsize=(10, 5))
plt.scatter(fitted_df_sorted['n'], fitted_df_sorted['error'], alpha=0.5, s=15, c='orange', edgecolors='black')
plt.title('Absolute Error vs Symbolic Dimension n')
plt.xlabel('n')
plt.ylabel('Absolute Error')
plt.grid(True)
plt.tight_layout()
plt.show()
fudge7.py
import numpy as np
import pandas as pd
import re
from scipy.optimize import minimize
from tqdm import tqdm
from joblib import Parallel, delayed
import logging
import time
import matplotlib.pyplot as plt
import os
# Set up logging
logging.basicConfig(filename='symbolic_fit.log', level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
# Primes list
PRIMES = [
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71,
73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151,
157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233,
239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317,
331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419,
421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503,
509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607,
613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701,
709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811,
821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911,
919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997
]
phi = (1 + np.sqrt(5)) / 2
fib_cache = {}
def fib_real(n):
if n in fib_cache:
return fib_cache[n]
if n > 100:
return 0.0
term1 = phi**n / np.sqrt(5)
term2 = ((1/phi)**n) * np.cos(np.pi * n)
result = term1 - term2
fib_cache[n] = result
return result
def D(n, beta, r=1.0, k=1.0, Omega=1.0, base=2, scale=1.0):
try:
Fn_beta = fib_real(n + beta)
idx = int(np.floor(n + beta) + len(PRIMES)) % len(PRIMES)
Pn_beta = PRIMES[idx]
# Use logarithmic form to avoid overflow
log_dyadic = (n + beta) * np.log(base)
if log_dyadic > 500: # Prevent overflow
return None
log_val = np.log(scale) + np.log(phi) + np.log(abs(Fn_beta) + 1e-30) + log_dyadic + np.log(Pn_beta) + np.log(Omega)
if n > 1000:
log_val += np.log(np.log(n) / np.log(1000))
if not np.isfinite(log_val):
return None
val = np.exp(log_val)
return np.sqrt(max(val, 1e-30)) * (r ** k)
except Exception as e:
logging.error(f"D failed: n={n}, beta={beta}, error={e}")
return None
def invert_D(value, r=1.0, k=1.0, Omega=1.0, base=2, scale=1.0, max_n=500):
candidates = []
log_val = np.log10(max(abs(value), 1e-30))
max_n = min(1000, max(500, int(200 * log_val)))
n_values = np.linspace(0, max_n, 50)
scale_factors = np.logspace(max(log_val - 2, -10), min(log_val + 2, 10), num=10)
try:
for n in tqdm(n_values, desc=f"invert_D for {value:.2e}", leave=False):
for beta in np.linspace(0, 1, 5):
for dynamic_scale in scale_factors:
for r_local in [0.5, 1.0]:
for k_local in [0.5, 1.0]:
val = D(n, beta, r_local, k_local, Omega, base, scale * dynamic_scale)
if val is None:
continue
diff = abs(val - value)
candidates.append((diff, n, beta, dynamic_scale, r_local, k_local))
if not candidates:
logging.error(f"invert_D: No valid candidates for value {value}")
return None
candidates = sorted(candidates, key=lambda x: x[0])[:20]
best = candidates[0]
emergent_uncertainty = np.std([D(n, beta, r, k, Omega, base, scale * s)
for _, n, beta, s, r, k in candidates if D(n, beta, r, k, Omega, base, scale * s) is not None])
if not np.isfinite(emergent_uncertainty):
logging.error(f"invert_D: Non-finite emergent uncertainty for value {value}")
return None
return best[1], best[2], best[3], emergent_uncertainty, best[4], best[5]
except Exception as e:
logging.error(f"invert_D failed for value {value}: {e}")
return None
def parse_codata_ascii(filename):
constants = []
pattern = re.compile(r"^\s*(.*?)\s{2,}(\-?\d+\.?\d*(?:\s*[Ee][\+\-]?\d+)?(?:\.\.\.)?)\s+(\-?\d+\.?\d*(?:\s*[Ee][\+\-]?\d+)?|exact)\s+(\S.*)")
with open(filename, "r") as f:
for line in f:
if line.startswith("Quantity") or line.strip() == "" or line.startswith("-"):
continue
m = pattern.match(line)
if m:
name, value_str, uncert_str, unit = m.groups()
try:
value = float(value_str.replace("...", "").replace(" ", ""))
uncertainty = 0.0 if uncert_str == "exact" else float(uncert_str.replace("...", "").replace(" ", ""))
constants.append({
"name": name.strip(),
"value": value,
"uncertainty": uncertainty,
"unit": unit.strip()
})
except Exception as e:
logging.warning(f"Failed parsing line: {line.strip()} - {e}")
continue
return pd.DataFrame(constants)
def check_physical_consistency(df_results):
bad_data = []
relations = [
('Planck constant', 'reduced Planck constant', lambda x, y: abs(x['scale'] / y['scale'] - 2 * np.pi), 0.1, 'scale ratio vs. 2π'),
('proton mass', 'proton-electron mass ratio', lambda x, y: abs(x['n'] - y['n'] - np.log10(1836)), 0.5, 'n difference vs. log(proton-electron ratio)'),
('Fermi coupling constant', 'weak mixing angle', lambda x, y: abs(x['scale'] - y['scale'] / np.sqrt(2)), 0.1, 'scale vs. sin²θ_W/√2'),
('tau energy equivalent', 'tau mass energy equivalent in MeV', lambda x, y: abs(x['value'] - y['value']), 0.01, 'value consistency')
]
for name1, name2, check_func, threshold, reason in relations:
try:
row1 = df_results[df_results['name'] == name1].iloc[0]
row2 = df_results[df_results['name'] == name2].iloc[0]
if check_func(row1, row2) > threshold:
bad_data.append((name1, f"Physical inconsistency: {reason}"))
bad_data.append((name2, f"Physical inconsistency: {reason}"))
except IndexError:
continue
return bad_data
def total_error(params, df_subset):
r, k, Omega, base, scale = params
df_results = symbolic_fit_all_constants(df_subset, base=base, Omega=Omega, r=r, k=k, scale=scale)
if df_results.empty:
return np.inf
error = df_results['error'].mean()
return error if np.isfinite(error) else np.inf
def symbolic_fit_all_constants(df, base=2, Omega=1.0, r=1.0, k=1.0, scale=1.0):
logging.info("Starting symbolic fit for all constants...")
results = []
def process_constant(row):
try:
result = invert_D(row['value'], r=r, k=k, Omega=Omega, base=base, scale=scale)
if result is None:
logging.error(f"Failed inversion for {row['name']}: {row['value']}")
return None
n, beta, dynamic_scale, emergent_uncertainty, r_local, k_local = result
approx = D(n, beta, r_local, k_local, Omega, base, scale * dynamic_scale)
if approx is None:
return None
error = abs(approx - row['value'])
rel_error = error / max(abs(row['value']), 1e-30)
return {
'name': row['name'], 'value': row['value'], 'unit': row['unit'],
'n': n, 'beta': beta, 'approx': approx, 'error': error,
'rel_error': rel_error, 'uncertainty': row['uncertainty'],
'emergent_uncertainty': emergent_uncertainty, 'r_local': r_local,
'k_local': k_local, 'scale': scale * dynamic_scale
}
except Exception as e:
logging.error(f"Error processing {row['name']}: {e}")
return None
results = Parallel(n_jobs=-1, timeout=15, backend='loky', maxtasksperchild=100)(
delayed(process_constant)(row) for row in tqdm(df.to_dict('records'), desc="Fitting constants")
)
results = [r for r in results if r is not None]
df_results = pd.DataFrame(results)
if not df_results.empty:
df_results['bad_data'] = False
df_results['bad_data_reason'] = ''
for name in df_results['name'].unique():
mask = df_results['name'] == name
if df_results.loc[mask, 'uncertainty'].notnull().any():
uncertainties = df_results.loc[mask, 'uncertainty'].dropna()
if not uncertainties.empty:
Q1, Q3 = np.percentile(uncertainties, [25, 75])
IQR = Q3 - Q1
outlier_mask = (uncertainties < Q1 - 1.5 * IQR) | (uncertainties > Q3 + 1.5 * IQR)
if outlier_mask.any():
df_results.loc[mask & df_results['uncertainty'].isin(uncertainties[outlier_mask]), 'bad_data'] = True
df_results.loc[mask & df_results['uncertainty'].isin(uncertainties[outlier_mask]), 'bad_data_reason'] += 'Uncertainty outlier; '
high_rel_error_mask = df_results['rel_error'] > 0.5
df_results.loc[high_rel_error_mask, 'bad_data'] = True
df_results.loc[high_rel_error_mask, 'bad_data_reason'] += df_results.loc[high_rel_error_mask, 'rel_error'].apply(lambda x: f"High relative uncertainty ({x:.2e} > 0.5); ")
high_uncertainty_mask = df_results['uncertainty'] > 2 * df_results['emergent_uncertainty']
df_results.loc[high_uncertainty_mask, 'bad_data'] = True
df_results.loc[high_uncertainty_mask, 'bad_data_reason'] += df_results.loc[high_uncertainty_mask].apply(
lambda row: f"Uncertainty deviates from emergent ({row['uncertainty']:.2e} vs. {row['emergent_uncertainty']:.2e}); ", axis=1)
bad_data = check_physical_consistency(df_results)
for name, reason in bad_data:
df_results.loc[df_results['name'] == name, 'bad_data'] = True
df_results.loc[df_results['name'] == name, 'bad_data_reason'] += reason + '; '
logging.info("Symbolic fit completed.")
return df_results
def main():
start_time = time.time()
if not os.path.exists("allascii.txt"):
raise FileNotFoundError("allascii.txt not found in the current directory")
df = parse_codata_ascii("allascii.txt")
logging.info(f"Parsed {len(df)} constants")
# Optimize parameters
subset_df = df.head(50)
init_params = [1.0, 1.0, 1.0, 2.0, 1.0] # r, k, Omega, base, scale
bounds = [(1e-5, 10), (1e-5, 10), (1e-5, 10), (1.5, 10), (1e-5, 100)]
print("Optimizing symbolic model parameters...")
res = minimize(total_error, init_params, args=(subset_df,), bounds=bounds, method='L-BFGS-B', options={'maxiter': 100})
r_opt, k_opt, Omega_opt, base_opt, scale_opt = res.x
print(f"Optimization complete. Found parameters:\nr = {r_opt:.6f}, k = {k_opt:.6f}, Omega = {Omega_opt:.6f}, base = {base_opt:.6f}, scale = {scale_opt:.6f}")
# Run final fit
df_results = symbolic_fit_all_constants(df, base=base_opt, Omega=Omega_opt, r=r_opt, k=k_opt, scale=scale_opt)
if not df_results.empty:
df_results.to_csv("symbolic_fit_results_emergent_fixed.txt", index=False)
logging.info(f"Saved results to symbolic_fit_results_emergent_fixed.txt")
else:
logging.error("No results to save")
logging.info(f"Total runtime: {time.time() - start_time:.2f} seconds")
# Display results
df_results_sorted = df_results.sort_values("error")
print("\nTop 20 best symbolic fits:")
print(df_results_sorted.head(20)[['name', 'value', 'unit', 'n', 'beta', 'approx', 'error', 'uncertainty', 'scale', 'bad_data', 'bad_data_reason']].to_string(index=False))
print("\nTop 20 worst symbolic fits:")
print(df_results_sorted.tail(20)[['name', 'value', 'unit', 'n', 'beta', 'approx', 'error', 'uncertainty', 'scale', 'bad_data', 'bad_data_reason']].to_string(index=False))
print("\nPotentially bad data constants summary:")
bad_data_df = df_results[df_results['bad_data'] == True][['name', 'value', 'error', 'rel_error', 'uncertainty', 'bad_data_reason']]
print(bad_data_df.to_string(index=False))
df_results_sorted.to_csv("symbolic_fit_results.txt", sep="\t", index=False)
# Plotting
plt.figure(figsize=(10, 5))
plt.hist(df_results_sorted['error'], bins=50, color='skyblue', edgecolor='black')
plt.title('Histogram of Absolute Errors in Symbolic Fit')
plt.xlabel('Absolute Error')
plt.ylabel('Count')
plt.grid(True)
plt.tight_layout()
plt.show()
plt.figure(figsize=(10, 5))
plt.scatter(df_results_sorted['n'], df_results_sorted['error'], alpha=0.5, s=15, c='orange', edgecolors='black')
plt.title('Absolute Error vs Symbolic Dimension n')
plt.xlabel('n')
plt.ylabel('Absolute Error')
plt.grid(True)
plt.tight_layout()
plt.show()
if __name__ == "__main__":
main()
fudge5.py
import numpy as np
import pandas as pd
import re
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from tqdm import tqdm
from joblib import Parallel, delayed
import logging
import time
# Setup logging
logging.basicConfig(level=logging.INFO, filename="symbolic_fit.log", filemode="w")
# Extended primes list (up to 1000)
PRIMES = [
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71,
73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151,
157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233,
239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317,
331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419,
421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503,
509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607,
613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701,
709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811,
821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911,
919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997
]
phi = (1 + np.sqrt(5)) / 2
fib_cache = {}
def fib_real(n):
if n in fib_cache:
return fib_cache[n]
from math import cos, pi, sqrt
phi_inv = 1 / phi
if n > 100:
return 0.0
term1 = phi**n / sqrt(5)
term2 = (phi_inv**n) * cos(pi * n)
result = term1 - term2
fib_cache[n] = result
return result
def D(n, beta, r=1.0, k=1.0, Omega=1.0, base=2, scale=1.0):
Fn_beta = fib_real(n + beta)
idx = int(np.floor(n + beta) + len(PRIMES)) % len(PRIMES)
Pn_beta = PRIMES[idx]
dyadic = base ** (n + beta)
val = scale * phi * Fn_beta * dyadic * Pn_beta * Omega
if n > 1000:
val *= np.log(n) / np.log(1000)
return np.sqrt(max(val, 1e-30)) * (r ** k)
def invert_D(value, r=1.0, k=1.0, Omega=1.0, base=2, scale=1.0, max_n=500, steps=200):
candidates = []
log_val = np.log10(max(abs(value), 1e-30))
max_n = min(5000, max(500, int(300 * log_val)))
steps = 100 if log_val < 3 else 200
n_values = np.logspace(0, np.log10(max_n), steps) if log_val > 3 else np.linspace(0, max_n, steps)
scale_factors = np.logspace(log_val - 5, log_val + 5, num=20)
try:
for n in n_values:
for beta in np.linspace(0, 1, 10):
for dynamic_scale in scale_factors:
for r_local in [0.5, 1.0]:
for k_local in [0.5, 1.0]:
val = D(n, beta, r_local, k_local, Omega, base, scale * dynamic_scale)
diff = abs(val - value)
candidates.append((diff, n, beta, dynamic_scale, r_local, k_local))
candidates = sorted(candidates, key=lambda x: x[0])[:10]
best = candidates[0]
emergent_uncertainty = np.std([D(n, beta, r, k, Omega, base, scale * s) for _, n, beta, s, r, k in candidates])
return best[1], best[2], best[3], emergent_uncertainty, best[4], best[5]
except Exception as e:
logging.error(f"invert_D failed for value {value}: {e}")
return None
def parse_codata_ascii(filename):
constants = []
pattern = re.compile(r"^\s*(.*?)\s{2,}([0-9Ee\+\-\.]+(?:\.\.\.)?)\s+([0-9Ee\+\-\.]+|exact)\s+(\S.*)")
with open(filename, "r") as f:
for line in f:
if line.startswith("Quantity") or line.strip() == "" or line.startswith("-"):
continue
m = pattern.match(line)
if m:
name, value_str, uncert_str, unit = m.groups()
try:
value = float(value_str.replace("...", ""))
uncertainty = 0.0 if uncert_str == "exact" else float(uncert_str.replace("...", ""))
constants.append({
"name": name.strip(),
"value": value,
"uncertainty": uncertainty,
"unit": unit.strip()
})
except Exception as e:
logging.warning(f"Failed parsing line: {line.strip()} - {e}")
continue
return pd.DataFrame(constants)
def check_physical_consistency(df_results):
bad_data = []
relations = [
('Planck constant', 'reduced Planck constant', lambda x, y: x['scale'] / y['scale'] - 2 * np.pi, 0.1, 'scale ratio vs. 2π'),
('proton mass', 'proton-electron mass ratio', lambda x, y: x['n'] - y['n'] - np.log10(1836), 0.5, 'n difference vs. log(proton-electron ratio)'),
('molar mass of carbon-12', 'Avogadro constant', lambda x, y: x['scale'] / y['scale'] - 12, 0.1, 'scale ratio vs. 12'),
('elementary charge', 'electron volt', lambda x, y: x['n'] - y['n'], 0.5, 'n difference vs. 0'),
('Rydberg constant', 'fine-structure constant', lambda x, y: x['n'] - 2 * y['n'] - np.log10(2 * np.pi), 0.5, 'n difference vs. log(2π)'),
('Boltzmann constant', 'electron volt-kelvin relationship', lambda x, y: x['scale'] / y['scale'] - 1, 0.1, 'scale ratio vs. 1'),
('Stefan-Boltzmann constant', 'second radiation constant', lambda x, y: x['n'] + 4 * y['n'] - np.log10(15 * 299792458**2 / (2 * np.pi**5)), 1.0, 'n relationship vs. c and k_B'),
('Fermi coupling constant', 'weak mixing angle', lambda x, y: x['scale'] / (y['value']**2 / np.sqrt(2)), 0.1, 'scale vs. sin²θ_W/√2'),
('tau mass energy equivalent in MeV', 'tau energy equivalent', lambda x, y: x['n'] - y['n'], 0.5, 'n difference vs. 0'),
]
for name1, name2, check_func, threshold, reason in relations:
if name1 in df_results['name'].values and name2 in df_results['name'].values:
fit1 = df_results[df_results['name'] == name1][['n', 'beta', 'scale', 'value']].iloc[0]
fit2 = df_results[df_results['name'] == name2][['n', 'beta', 'scale', 'value']].iloc[0]
diff = abs(check_func(fit1, fit2))
if diff > threshold:
bad_data.append({
'name': name2,
'value': df_results[df_results['name'] == name2]['value'].iloc[0],
'reason': f'Model {reason} inconsistent ({diff:.2e} > {threshold:.2e})'
})
return bad_data
def fit_single_constant(row, r, k, Omega, base, scale, max_n, steps, error_threshold, median_uncertainties):
start_time = time.time()
val = row['value']
if val <= 0 or val > 1e50:
logging.warning(f"Skipping {row['name']}: Invalid value {val}")
return None
try:
result = invert_D(val, r, k, Omega, base, scale, max_n, steps)
if result is None:
logging.error(f"invert_D returned None for {row['name']}")
return None
n, beta, dynamic_scale, emergent_uncertainty, r_local, k_local = result
approx = D(n, beta, r_local, k_local, Omega, base, scale * dynamic_scale)
error = abs(val - approx)
rel_error = error / max(abs(val), 1e-30)
log_val = np.log10(max(abs(val), 1e-30))
# Bad data detection
bad_data = False
bad_data_reason = []
# Uncertainty check
if row['uncertainty'] is not None and row['uncertainty'] > 0:
rel_uncert = row['uncertainty'] / max(abs(val), 1e-30)
if rel_uncert > 0.5:
bad_data = True
bad_data_reason.append(f"High relative uncertainty ({rel_uncert:.2e} > 0.5)")
if abs(row['uncertainty'] - emergent_uncertainty) > 1.5 * emergent_uncertainty or \
abs(row['uncertainty'] - emergent_uncertainty) / max(emergent_uncertainty, 1e-30) > 1.0:
bad_data = True
bad_data_reason.append(f"Uncertainty deviates from emergent ({row['uncertainty']:.2e} vs. {emergent_uncertainty:.2e})")
# Outlier check
if error > error_threshold and row['uncertainty'] is not None:
bin_idx = min(int((log_val + 50) // 10), len(median_uncertainties) - 1)
median_uncert = median_uncertainties[bin_idx] if bin_idx >= 0 else np.median(df['uncertainty'].dropna())
if row['uncertainty'] > 0 and row['uncertainty'] < median_uncert:
bad_data = True
bad_data_reason.append("High error with low uncertainty")
if row['uncertainty'] > 0 and error > 10 * row['uncertainty']:
bad_data = True
bad_data_reason.append("Error exceeds 10x uncertainty")
# Clear fib_cache after each constant
global fib_cache
fib_cache.clear()
if time.time() - start_time > 5: # Timeout after 5 seconds
logging.warning(f"Timeout for {row['name']}: {time.time() - start_time:.2f} seconds")
return None
return {
"name": row['name'],
"value": val,
"unit": row['unit'],
"n": n,
"beta": beta,
"approx": approx,
"error": error,
"rel_error": rel_error,
"uncertainty": row['uncertainty'],
"emergent_uncertainty": emergent_uncertainty,
"r_local": r_local,
"k_local": k_local,
"scale": dynamic_scale,
"bad_data": bad_data,
"bad_data_reason": "; ".join(bad_data_reason) if bad_data_reason else ""
}
except Exception as e:
logging.error(f"Failed inversion for {row['name']}: {e}")
return None
def symbolic_fit_all_constants(df, r=1.0, k=1.0, Omega=1.0, base=2, scale=1.0, max_n=500, steps=200):
logging.info("Starting symbolic fit for all constants...")
# Preliminary fit to get error threshold
results = Parallel(n_jobs=-1, backend='loky')(
delayed(fit_single_constant)(row, r, k, Omega, base, scale, max_n, steps, np.inf, {})
for _, row in df.iterrows()
)
results = [r for r in results if r is not None]
df_results = pd.DataFrame(results)
error_threshold = np.percentile(df_results['error'], 95) if not df_results.empty else np.inf
# Calculate median uncertainties per magnitude bin
log_values = np.log10(df_results['value'].abs().clip(1e-30))
try:
bins = pd.qcut(log_values, 5, duplicates='drop')
except Exception as e:
logging.warning(f"pd.qcut failed: {e}. Using default binning.")
bins = pd.cut(log_values, 5)
median_uncertainties = {}
for bin in bins.unique():
mask = bins == bin
median_uncert = df_results[mask]['uncertainty'].median()
if not np.isnan(median_uncert):
bin_idx = min(int((bin.mid + 50) // 10), 10)
median_uncertainties[bin_idx] = median_uncert
# Final fit with error threshold and median uncertainties
results = Parallel(n_jobs=-1, backend='loky')(
delayed(fit_single_constant)(row, r, k, Omega, base, scale, max_n, steps, error_threshold, median_uncertainties)
for _, row in df.iterrows()
)
results = [r for r in results if r is not None]
df_results = pd.DataFrame(results)
# Physical consistency check
bad_data_physical = check_physical_consistency(df_results)
for bad in bad_data_physical:
df_results.loc[df_results['name'] == bad['name'], 'bad_data'] = True
df_results.loc[df_results['name'] == bad['name'], 'bad_data_reason'] = (
df_results.loc[df_results['name'] == bad['name'], 'bad_data_reason'] + "; " + bad['reason']
).str.strip("; ")
# Uncertainty outlier check using IQR
if not df_results.empty:
for bin in bins.unique():
mask = bins == bin
if df_results[mask]['uncertainty'].notnull().any():
uncertainties = df_results[mask]['uncertainty'].dropna()
q1, q3 = np.percentile(uncertainties, [25, 75])
iqr = q3 - q1
outlier_threshold = q3 + 3 * iqr
df_results.loc[mask & (df_results['uncertainty'] > outlier_threshold), 'bad_data'] = True
df_results.loc[mask & (df_results['uncertainty'] > outlier_threshold), 'bad_data_reason'] = (
df_results['bad_data_reason'] + "; Uncertainty outlier"
).str.strip("; ")
logging.info("Symbolic fit completed.")
return df_results
def total_error(params, df):
r, k, Omega, base, scale = params
try:
df_fit = symbolic_fit_all_constants(df, r=r, k=k, Omega=Omega, base=base, scale=scale, max_n=500, steps=200)
threshold = np.percentile(df_fit['error'], 95)
filtered = df_fit[df_fit['error'] <= threshold]
rel_err = ((filtered['value'] - filtered['approx']) / filtered['value'])**2
return rel_err.sum()
except Exception as e:
logging.error(f"total_error failed: {e}")
return np.inf
if __name__ == "__main__":
print("Parsing CODATA constants from allascii.txt...")
start_time = time.time()
codata_df = parse_codata_ascii("allascii.txt")
print(f"Parsed {len(codata_df)} constants in {time.time() - start_time:.2f} seconds.")
# Use a smaller subset for optimization
subset_df = codata_df.head(20)
init_params = [1.0, 1.0, 1.0, 2.0, 1.0]
bounds = [(1e-5, 10), (1e-5, 10), (1e-5, 10), (1.5, 10), (1e-5, 100)]
print("Optimizing symbolic model parameters...")
start_time = time.time()
res = minimize(total_error, init_params, args=(subset_df,), bounds=bounds, method='L-BFGS-B', options={'maxiter': 50})
r_opt, k_opt, Omega_opt, base_opt, scale_opt = res.x
print(f"Optimization complete in {time.time() - start_time:.2f} seconds. Found parameters:\nr = {r_opt:.6f}, k = {k_opt:.6f}, Omega = {Omega_opt:.6f}, base = {base_opt:.6f}, scale = {scale_opt:.6f}")
print("Fitting symbolic dimensions to all constants...")
start_time = time.time()
fitted_df = symbolic_fit_all_constants(codata_df, r=r_opt, k=k_opt, Omega=Omega_opt, base=base_opt, scale=scale_opt, max_n=500, steps=200)
fitted_df_sorted = fitted_df.sort_values("error")
print(f"Fitting complete in {time.time() - start_time:.2f} seconds.")
print("\nTop 20 best symbolic fits:")
print(fitted_df_sorted.head(20)[['name', 'value', 'unit', 'n', 'beta', 'approx', 'error', 'uncertainty', 'emergent_uncertainty', 'r_local', 'k_local', 'scale', 'bad_data', 'bad_data_reason']].to_string(index=False))
print("\nTop 20 worst symbolic fits:")
print(fitted_df_sorted.tail(20)[['name', 'value', 'unit', 'n', 'beta', 'approx', 'error', 'uncertainty', 'emergent_uncertainty', 'r_local', 'k_local', 'scale', 'bad_data', 'bad_data_reason']].to_string(index=False))
print("\nPotentially bad data constants summary:")
bad_data_df = fitted_df[fitted_df['bad_data'] == True][['name', 'value', 'error', 'rel_error', 'uncertainty', 'emergent_uncertainty', 'bad_data_reason']]
print(bad_data_df.to_string(index=False))
fitted_df_sorted.to_csv("symbolic_fit_results_emergent_optimized.txt", sep="\t", index=False)
plt.figure(figsize=(10, 5))
plt.hist(fitted_df_sorted['error'], bins=50, color='skyblue', edgecolor='black')
plt.title('Histogram of Absolute Errors in Symbolic Fit')
plt.xlabel('Absolute Error')
plt.ylabel('Count')
plt.grid(True)
plt.tight_layout()
plt.savefig("error_histogram.png")
plt.close()
plt.figure(figsize=(10, 5))
plt.scatter(fitted_df_sorted['n'], fitted_df_sorted['error'], alpha=0.5, s=15, c='orange', edgecolors='black')
plt.title('Absolute Error vs Symbolic Dimension n')
plt.xlabel('n')
plt.ylabel('Absolute Error')
plt.grid(True)
plt.tight_layout()
plt.savefig("error_vs_n.png")
plt.close()
print(f"Total runtime: {time.time() - start_time:.2f} seconds. Check symbolic_fit.log for details.")
fudge1.py
import numpy as np
import pandas as pd
import re
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from tqdm import tqdm
from joblib import Parallel, delayed
# Extended primes list (up to 1000)
PRIMES = [
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71,
73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151,
157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233,
239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317,
331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419,
421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503,
509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607,
613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701,
709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811,
821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911,
919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997
]
phi = (1 + np.sqrt(5)) / 2
fib_cache = {}
def fib_real(n):
if n in fib_cache:
return fib_cache[n]
from math import cos, pi, sqrt
phi_inv = 1 / phi
if n > 100:
return 0.0
term1 = phi**n / sqrt(5)
term2 = (phi_inv**n) * cos(pi * n)
result = term1 - term2
fib_cache[n] = result
return result
def D(n, beta, r=1.0, k=1.0, Omega=1.0, base=2, scale=1.0):
Fn_beta = fib_real(n + beta)
idx = int(np.floor(n + beta) + len(PRIMES)) % len(PRIMES)
Pn_beta = PRIMES[idx]
dyadic = base ** (n + beta)
val = scale * phi * Fn_beta * dyadic * Pn_beta * Omega
val = np.maximum(val, 1e-30)
return np.sqrt(val) * (r ** k)
def invert_D(value, r=1.0, k=1.0, Omega=1.0, base=2, scale=1.0, max_n=500, steps=500):
candidates = []
log_val = np.log10(max(abs(value), 1e-30))
scale_factors = np.logspace(log_val - 4, log_val + 4, num=20)
max_n = min(5000, max(100, int(200 * log_val)))
steps = min(3000, max(500, int(200 * log_val)))
if log_val > 3:
n_values = np.logspace(0, np.log10(max_n), steps)
else:
n_values = np.linspace(0, max_n, steps)
for n in n_values:
for beta in np.linspace(0, 1, 10):
for dynamic_scale in scale_factors:
val = D(n, beta, r, k, Omega, base, scale * dynamic_scale)
diff = abs(val - value)
candidates.append((diff, n, beta, dynamic_scale))
candidates = sorted(candidates, key=lambda x: x[0])[:10]
best = candidates[0]
return best[1], best[2], best[3]
def parse_codata_ascii(filename):
constants = []
pattern = re.compile(r"^\s*(.*?)\s{2,}([0-9Ee\+\-\.]+)\s+([0-9Ee\+\-\.]+|exact)\s+(\S+)")
with open(filename, "r") as f:
for line in f:
if line.startswith("Quantity") or line.strip() == "" or line.startswith("-"):
continue
m = pattern.match(line)
if m:
name, value_str, uncert_str, unit = m.groups()
try:
value = float(value_str.replace("e", "E"))
uncertainty = None if uncert_str == "exact" else float(uncert_str.replace("e", "E"))
constants.append({
"name": name.strip(),
"value": value,
"uncertainty": uncertainty,
"unit": unit.strip()
})
except:
continue
return pd.DataFrame(constants)
def check_physical_consistency(df):
bad_data = []
# Mass ratio consistency (e.g., proton-electron mass ratio)
proton_mass = df[df['name'] == 'proton mass']['value'].iloc[0] if 'proton mass' in df['name'].values else None
electron_mass = df[df['name'] == 'electron mass']['value'].iloc[0] if 'electron mass' in df['name'].values else None
proton_electron_ratio = df[df['name'] == 'proton-electron mass ratio']['value'].iloc[0] if 'proton-electron mass ratio' in df['name'].values else None
if proton_mass and electron_mass and proton_electron_ratio:
calc_ratio = proton_mass / electron_mass
diff = abs(calc_ratio - proton_electron_ratio)
uncert = df[df['name'] == 'proton-electron mass ratio']['uncertainty'].iloc[0]
if uncert is not None and diff > 5 * uncert:
bad_data.append({
'name': 'proton-electron mass ratio',
'value': proton_electron_ratio,
'reason': f'Inconsistent with proton mass / electron mass (diff: {diff:.2e} > 5 * {uncert:.2e})'
})
# Speed of light vs. inverse meter-hertz relationship
c = df[df['name'] == 'speed of light in vacuum']['value'].iloc[0] if 'speed of light in vacuum' in df['name'].values else None
inv_m_hz = df[df['name'] == 'inverse meter-hertz relationship']['value'].iloc[0] if 'inverse meter-hertz relationship' in df['name'].values else None
if c and inv_m_hz and abs(c - inv_m_hz) > 1e-6:
bad_data.append({
'name': 'inverse meter-hertz relationship',
'value': inv_m_hz,
'reason': f'Inconsistent with speed of light ({c:.2e} vs. {inv_m_hz:.2e})'
})
# Planck constant vs. reduced Planck constant
h = df[df['name'] == 'Planck constant']['value'].iloc[0] if 'Planck constant' in df['name'].values else None
h_bar = df[df['name'] == 'reduced Planck constant']['value'].iloc[0] if 'reduced Planck constant' in df['name'].values else None
if h and h_bar and abs(h / (2 * np.pi) - h_bar) > 1e-10:
bad_data.append({
'name': 'reduced Planck constant',
'value': h_bar,
'reason': f'Inconsistent with Planck constant / (2π) ({h/(2*np.pi):.2e} vs. {h_bar:.2e})'
})
return bad_data
def fit_single_constant(row, r, k, Omega, base, scale, max_n, steps, error_threshold):
val = row['value']
if val <= 0 or val > 1e50:
return None
try:
n, beta, dynamic_scale = invert_D(val, r, k, Omega, base, scale, max_n, steps)
approx = D(n, beta, r, k, Omega, base, scale * dynamic_scale)
error = abs(val - approx)
rel_error = error / max(abs(val), 1e-30)
log_val = np.log10(max(abs(val), 1e-30))
max_n = min(5000, max(100, int(200 * log_val)))
scale_factors = np.logspace(log_val - 4, log_val + 4, num=20)
# Bad data detection
bad_data = False
bad_data_reason = []
# Uncertainty check
if row['uncertainty'] is not None:
if row['uncertainty'] < 1e-10 or row['uncertainty'] > 0.1 * abs(val):
bad_data = True
bad_data_reason.append("Suspicious uncertainty")
# Outlier check
if error > error_threshold and row['uncertainty'] is not None and row['uncertainty'] < 1e-5 * abs(val):
bad_data = True
bad_data_reason.append("High error with low uncertainty")
return {
"name": row['name'],
"value": val,
"unit": row['unit'],
"n": n,
"beta": beta,
"approx": approx,
"error": error,
"rel_error": rel_error,
"uncertainty": row['uncertainty'],
"scale": dynamic_scale,
"bad_data": bad_data,
"bad_data_reason": "; ".join(bad_data_reason) if bad_data_reason else ""
}
except Exception as e:
print(f"Failed inversion for {row['name']}: {e}")
return None
def symbolic_fit_all_constants(df, r=1.0, k=1.0, Omega=1.0, base=2, scale=1.0, max_n=500, steps=500):
# Preliminary fit to get error threshold
results = Parallel(n_jobs=20)(
delayed(fit_single_constant)(row, r, k, Omega, base, scale, max_n, steps, np.inf)
for _, row in df.iterrows()
)
results = [r for r in results if r is not None]
df_results = pd.DataFrame(results)
error_threshold = np.percentile(df_results['error'], 99) if not df_results.empty else np.inf
# Final fit with error threshold
results = Parallel(n_jobs=20)(
delayed(fit_single_constant)(row, r, k, Omega, base, scale, max_n, steps, error_threshold)
for _, row in df.iterrows()
)
results = [r for r in results if r is not None]
df_results = pd.DataFrame(results)
# Physical consistency check
bad_data_physical = check_physical_consistency(df)
for bad in bad_data_physical:
df_results.loc[df_results['name'] == bad['name'], 'bad_data'] = True
df_results.loc[df_results['name'] == bad['name'], 'bad_data_reason'] = (
df_results.loc[df_results['name'] == bad['name'], 'bad_data_reason'] + "; " + bad['reason']
).str.strip("; ")
# Uncertainty outlier check
if not df_results.empty:
log_values = np.log10(df_results['value'].abs().clip(1e-30))
bins = pd.qcut(log_values, 5, duplicates='drop')
for bin in bins.unique():
mask = bins == bin
if df_results[mask]['uncertainty'].notnull().any():
median_uncert = df_results[mask]['uncertainty'].median()
std_uncert = df_results[mask]['uncertainty'].std()
if not np.isnan(std_uncert):
df_results.loc[mask & (df_results['uncertainty'] > median_uncert + 3 * std_uncert), 'bad_data'] = True
df_results.loc[mask & (df_results['uncertainty'] > median_uncert + 3 * std_uncert), 'bad_data_reason'] = (
df_results['bad_data_reason'] + "; Uncertainty outlier"
).str.strip("; ")
# Clear fib_cache
global fib_cache
if len(fib_cache) > 10000:
fib_cache.clear()
return df_results
def total_error(params, df):
r, k, Omega, base, scale = params
df_fit = symbolic_fit_all_constants(df, r=r, k=k, Omega=Omega, base=base, scale=scale, max_n=500, steps=500)
threshold = np.percentile(df_fit['error'], 95)
filtered = df_fit[df_fit['error'] <= threshold]
rel_err = ((filtered['value'] - filtered['approx']) / filtered['value'])**2
return rel_err.sum()
if __name__ == "__main__":
print("Parsing CODATA constants from allascii.txt...")
codata_df = parse_codata_ascii("allascii.txt")
print(f"Parsed {len(codata_df)} constants.")
# Use a larger subset for optimization
subset_df = codata_df.head(50)
init_params = [1.0, 1.0, 1.0, 2.0, 1.0]
bounds = [(1e-5, 10), (1e-5, 10), (1e-5, 10), (1.5, 10), (1e-5, 100)]
print("Optimizing symbolic model parameters...")
res = minimize(total_error, init_params, args=(subset_df,), bounds=bounds, method='L-BFGS-B', options={'maxiter': 100})
r_opt, k_opt, Omega_opt, base_opt, scale_opt = res.x
print(f"Optimization complete. Found parameters:\nr = {r_opt:.6f}, k = {k_opt:.6f}, Omega = {Omega_opt:.6f}, base = {base_opt:.6f}, scale = {scale_opt:.6f}")
print("Fitting symbolic dimensions to all constants...")
fitted_df = symbolic_fit_all_constants(codata_df, r=r_opt, k=k_opt, Omega=Omega_opt, base=base_opt, scale=scale_opt, max_n=500, steps=500)
fitted_df_sorted = fitted_df.sort_values("error")
print("\nTop 20 best symbolic fits:")
print(fitted_df_sorted.head(20)[['name', 'value', 'unit', 'n', 'beta', 'approx', 'error', 'uncertainty', 'scale', 'bad_data', 'bad_data_reason']].to_string(index=False))
print("\nTop 20 worst symbolic fits:")
print(fitted_df_sorted.tail(20)[['name', 'value', 'unit', 'n', 'beta', 'approx', 'error', 'uncertainty', 'scale', 'bad_data', 'bad_data_reason']].to_string(index=False))
print("\nPotentially bad data constants summary:")
bad_data_df = fitted_df[fitted_df['bad_data'] == True][['name', 'value', 'error', 'rel_error', 'uncertainty', 'bad_data_reason']]
print(bad_data_df.to_string(index=False))
fitted_df_sorted.to_csv("symbolic_fit_results.txt", sep="\t", index=False)
plt.figure(figsize=(10, 5))
plt.hist(fitted_df_sorted['error'], bins=50, color='skyblue', edgecolor='black')
plt.title('Histogram of Absolute Errors in Symbolic Fit')
plt.xlabel('Absolute Error')
plt.ylabel('Count')
plt.grid(True)
plt.tight_layout()
plt.show()
plt.figure(figsize=(10, 5))
plt.scatter(fitted_df_sorted['n'], fitted_df_sorted['error'], alpha=0.5, s=15, c='orange', edgecolors='black')
plt.title('Absolute Error vs Symbolic Dimension n')
plt.xlabel('n')
plt.ylabel('Absolute Error')
plt.grid(True)
plt.tight_layout()
plt.show()