Juan D. Correa - Software Developer/Linux System Administration
astropema@gmail.com
Current Projects
Below are active creative and technical projects that merge systems thinking, AI research, symbolic language, and practical Linux system administration.
Astro Pema
Overview: Astro Pema is a mythopoetic machine intelligence project that reimagines astrology as a symbolic language for
exploring consciousness. It combines traditional astrological logic, planetary pattern databases,
and vector representations with modern SLM/LLM-based narrative synthesis.
System Goals: The goal isn’t to reproduce astrology but to use it as a structure to generate symbolic prompts, reflect mythic intelligence, and explore emergent semantic space through language models.
Core Components:
- PostgreSQL database (
astropema
) stores over 3000 curated planetary aspect interpretations.
- Custom Python scripts handle birth chart parsing, JSON formatting, and prompt generation for language models.
- GGUF-compatible SLMs such as
mistral-7b-instruct
and mythomax
produce structured symbolic narratives.
- Web interface (in progress) to let users generate their own charts and a receive mithopoetic readings based on the natal chart
synthesis of its own mythical meanings. Each chart is unique to the individual, because each planetary combination of keys is unique to each individual (when time of
birth is included).
Tech Stack:
- Python (LLM logic, JSON processing)
- PostgreSQL (chart + interpretation database)
- Bash & SSH (local server management across 4 Linux boxes)
- Frontend: HTML/CSS, future use of Flask or PHP-based UI
Hardware: Astro Pema runs on a cluster of repurposed laptops and desktops:
roca
– main execution node (Linux, no GPU)
oba
– SLM host with optimized GGUF models
pema
, coyote
– storage, backup, and database servers
Project Status: Currently running local experiments with model-driven synthesis. Future work includes refining the database, improving prompt variation, and developing a fully interactive frontend for public access.
Atari Deep Reinforcement Learning Project
Project Scope:
This project explores deep reinforcement learning by training an AI agent to master classic Atari 2600 games—specifically, Breakout—using the Arcade
Learning Environment (ALE) and the DQN (Deep Q-Network) architecture implemented through Stable-Baselines3. The primary objective is to train a model
from scratch on local hardware using custom Python code, visual feedback via TensorBoard, and video capture of agent behavior across training milestones.
Why Atari Still Matters in AI
The path from video games to advanced artificial intelligence might sound like science fiction, but it's real—and it starts with Atari.
In 2013, DeepMind’s groundbreaking work showed that a single deep neural network could learn to play dozens of Atari 2600 games using only raw pixel input and reward signals. The algorithm, known as Deep Q-Network (DQN), didn’t need hand-crafted features or pre-programmed strategies—it learned by playing.
Atari games provided the perfect training ground: standardized environments, deterministic rules, visual complexity, and delayed rewards. Mastering them was a critical milestone in proving that deep reinforcement learning could handle real-world-like complexity.
The same core ideas—trial-and-error learning, value estimation, policy optimization—are now used in training robotics, self-driving cars, conversational agents, and autonomous drones. Even today, many research labs still benchmark algorithms against the Atari Learning Environment (ALE).
In that sense, beating Breakout isn’t just retro fun—it’s a rite of passage for AI agents. And for human developers, it’s an elegant way to understand how learning, memory, and decision-making can emerge from feedback and experience.
Technical Stack
- Framework: PyTorch + Stable-Baselines3
- Environment: Gymnasium with ALE + custom ROM handling
- Model: DQN with CNN policy, experience replay, exploration decay
- Tooling: TensorBoard, imageio for video, cron-based job scheduling
- Hardware: Local CPU-based Linux machine (no GPU), Ubuntu 24.04, 16 GB RAM
Training Protocol
We train the model daily in long-running sessions (~12-18 hours), saving checkpoints every 100,000 timesteps. A cron job launches training sessions automatically. After each session, the model is evaluated by generating an .mp4 video of gameplay to visually assess improvement. TensorBoard logs allow for insight into loss reduction, episode reward progression, and Q-value stability.
Goals
- Achieve consistent episode rewards >10 in Breakout
- Learn to interpret TensorBoard metrics to inform architecture and hyperparameter tuning
- Develop a full feedback loop: train → evaluate → adjust → retrain
- Build resilience through CPU-only training and memory constraints
- Establish reproducible results through versioned models and logs
Challenges
Given the lack of GPU acceleration, the agent is trained slowly—roughly 2 million timesteps per 24 hours. Replay buffers and checkpoints must be carefully managed to avoid memory saturation. Careful use of logging and rendering ensures progress can be tracked even without real-time monitoring.
Next Steps
- Compare DQN with PPO and A2C on the same game
- Run visual evaluations across different checkpoint stages
- Eventually test on GPU hardware to compare acceleration gains
- Publish a live training log via web interface and integrate result video playback
Why It Matters
This project is both a technical testbed and a philosophical experiment in developing autonomous agents using minimal resources. It demonstrates what’s possible through determination, iterative design, and disciplined system administration—without relying on cloud APIs or commercial platforms. The learned behaviors of this AI agent represent a bridge between game mechanics and emerging intelligence.
LLM-SLM-Assisted Knowledge Database (PlantDB)
Project Scope:
This ongoing project explores the use of local small language models (SLMs) and larger hosted LLMs to generate, structure, and insert scientifically meaningful data into a custom PostgreSQL database. Our focus has been on medicinal plant knowledge from the Veracruz region in Mexico—leveraging generative models to synthesize structured information from minimal prompts (e.g., Latin names).
Pipeline Architecture
- Language Models:
- SLM:
mistral-7b-instruct-v0.2.Q5_K_M.gguf
via llama.cpp
- LLM fallback: Gemini or OpenAI GPT-4 via web interface or scripted offline fallback
- Runtime: Python script running locally via
llama_cpp
bindings (quantized models)
- Prompt Template: Custom per-plant template querying Latin name for region, uses, compounds, preparations
- Timing: Each entry takes 90–150 seconds depending on model and prompt complexity
- Data Format: Output parsed and stored as structured PostgreSQL rows with raw output saved
Database Schema
PostgreSQL table medicinal_plants
contains the following fields:
- latin_name
- common_name
- region
- medicinal_use
- preparation
- compounds
- source (e.g., model version)
- raw_output (original model text)
- prompt
- model_version
Process Summary
Plant names (Latin binomials) are read from a text file and processed one by one. The prompt is dynamically generated, sent to the SLM, and the output is parsed and logged (both to screen and a versioned log file). PostgreSQL insertion is handled via psycopg2
. A total time tracker is recorded per run for benchmarking across models.
Hardware
- Model host: Local Ubuntu 24.04 box running llama.cpp with 16 GB RAM
- Database: PostgreSQL 16, hosted on the same system
- SLM performance: ~2 minutes per query with Q5_K_M quantization
Usage Philosophy
Rather than "extract" data, the models are tasked with synthesizing culturally rooted, biologically informed summaries. This combines computational creativity with traditional knowledge systems—respectfully and with attribution to the model as source. This work also aims to explore the role of language models in digital ethnobotany and modern herbology.
Next Steps
- Add support for batch insertion skipping duplicates
- Automate extraction of structured fields (e.g., common name, chemical constituents)
- Build a front-end interface for real-time query, search, and display
- Evaluate GPT-generated entries against verified botanical sources
- Train a smaller fine-tuned model on the dataset created from this work
Broader Application
This method of SLM-assisted database generation can be adapted to other domains: traditional medicine, local biodiversity indexing, cultural archives, or knowledge capture from oral history. It demonstrates the ability of small models to structure domain knowledge on local hardware—democratizing access to AI-enhanced research tools.
Personal Portfolio and Research Hub
Project Scope:
This website is an evolving project—serving both as a digital portfolio and as a testing ground for design, data presentation, and backend interfacing. It is intended to showcase current technical projects, long-term research, and personal creative experiments across disciplines.
Technical Stack
- Frontend: HTML5, CSS3, embedded PDFs, video, and iframes
- Backend (planned): PHP for PostgreSQL interfacing
- Styling: Minimalist CSS with semi-transparent elements to support content readability over a dark background image
- Hosting: Self-hosted on a Linux server using Apache2
- Content Type: Static project previews, notebooks, PDF reports, embedded simulations, and AI-generated visualizations
Design Philosophy
The visual design prioritizes clarity and creative flow. Background imagery and minimal shadows give each section depth, while semi-transparent containers ensure text readability without sacrificing aesthetics. The site is built to evolve incrementally—each new project gets integrated live, as it matures.
Goals
- Showcase a wide range of work from ML, DL, simulations, and system architecture
- Document research and learning with linked notebooks and media
- Create a foundation for more dynamic web-based tools (e.g. chart generators, plant DB search)
- Maintain low dependencies and full local control over deployment
Challenges
Because the site is hand-built without frameworks, each visual and layout change requires precision. Ensuring full browser compatibility and fast load times while embedding heavier assets (like videos and notebooks) adds complexity.
Next Steps
- Integrate a PHP-based query system to interface with the medicinal plant PostgreSQL DB
- Develop a simple backend for users to generate custom astrological reports from natal chart data
- Refine mobile responsiveness and explore light/dark mode toggles
- Eventually containerize for portable deployment and possible remote backups