HOWTO

Prerequisites

  1. Create an empty videos.txt file under the root directory.

  2. Create a directory for each month of the year under the root directory.

  3. Add a header.md file under each monthly directory with the following content. Example for January:

    <!-- toc --><!-- tocstop -->
# January 2025

RIAY January 2025

markdown-toc-gen won’t generate the table of contents for the monthly markdown (in this case, January.md) without the mandatory <-- toc --> and <!-- tocstop --> comments header.

You can replace the top-level markdown header

# January 2025

RIAY January 2025

with your own if you wish.

Add a compact.txt file with the first line as header.md under each monthly directory. This ensures the presence of the header for each month’s markdown.

Note: You can do all this by simply executing the script setup.

Environment variables

  1. Export an environment variable GITHUB_USERNAME by adding the following line to your .bash_profile file.

  2. Substitute your Github user id for <userid>.

    export GITHUB_USERNAME="<userid>"
    
  3. Export environment variable YOUTUBE_API_KEY by adding the following line to .bash_profile.

    export YOUTUBE_API_KEY=<api_key>
    

    Substitute your Google API Key which can access YouTube Data API. Set up your API key using instructions at https://support.google.com/googleapi/answer/6158862?hl=en

  4. Export environment variable DEEPSEEK_API_KEY by adding the following line to .bash_profile.

    export DEEPSEEK_API_KEY=<api_key>
    

    Substitute your DeepSeek API Key which can access YouTube Data API. Set up your API key using instructions at https://docs.aicontentlabs.com/articles/deepseek-api-key/

  5. Export environment variable DEEPINFRA_TOKEN by adding the following line to .bash_profile.

    export DEEPINFRA_TOKEN=<api_key>
    

Note: You will need to set up either DeepSeek or Gemini API keys for AI-generation of podcast summaries.

Configuration

The configuration file config.env is located in the repo’s root directory.

It contains the following settings:

General Settings

# Project config values
PROJECT="Rosary In A Year (RIAY)"
# Turn on or turn off logging
LOGGING=false
# The year in which the podcasts are being followed
YEAR=2025

# Github config values
REPO_OWNER=linusjf
REPO_NAME=RIAY

Overlay Icon Configuration

ICON_FILE="play-button.png"
ICON_SIZE="256x256"
ICON_OFFSET="+32+0"
ICON_COMMENT="Play Icon Added"

File Configuration

COMPACT_FILE="compact.txt"
VIDEOS_FILE="videos.txt"

API Request Settings

# Maximum number of retries for a REST API call
CURL_MAX_RETRIES=5
# Initial retry delay (seconds) - increases exponentially
CURL_INITIAL_RETRY_DELAY=2
# Connection timeout for curl requests (seconds)
CURL_CONNECT_TIMEOUT=30
# Maximum time for curl operations (seconds)
CURL_MAX_TIME=90

Transcription Settings

# Whether to transcribe videos
TRANSCRIBE_VIDEOS=false
# Whether to transcribe locally
TRANSCRIBE_LOCALLY=false
# Whether to use faster-whisper python library
USE_FASTER_WHISPER=true
# Whether to enable failover mode
ENABLE_FAILOVER_MODE=true

# ASR API KEY
ASR_LLM_API_KEY="$DEEPINFRA_TOKEN"
# ASR LLM base url
ASR_LLM_BASE_URL="https://api.deepinfra.com/v1"
# ASR LLM endpoint
ASR_LLM_ENDPOINT="/openai/audio/transcriptions"
# ASR LLM model
ASR_LLM_MODEL="openai/whisper-large-v3"
# ASR local model
ASR_LOCAL_MODEL="small"
# Initial prompt for whisper
ASR_INITIAL_PROMPT="In Ascension Press' Rosary in a Year podcast, Fr. Mark-Mary Ames meditates with sacred art, saint writings, and scripture. Poco a poco."
# Whether to carry initial prompt
ASR_CARRY_INITIAL_PROMPT=true
# Beam size for ASR
ASR_BEAM_SIZE=5

YouTube Settings

# Number of retries for yt-dlp
YT_DLP_RETRIES=20
# Timeout for yt-dlp (seconds)
YT_DLP_SOCKET_TIMEOUT=30
# Captions output directory
CAPTIONS_OUTPUT_DIR="captions"

AI Model Settings

# Temperature for LLM responses (0 for deterministic output)
TEMPERATURE=0.5

# text llm api key
TEXT_LLM_API_KEY="$DEEPSEEK_API_KEY"
# text llm base url
TEXT_LLM_BASE_URL="https://api.deepseek.com"
# text llm chat model endpoint
TEXT_LLM_CHAT_ENDPOINT="/chat/completions"
# text llm model used for summarization
TEXT_LLM_MODEL="deepseek-chat"

# The system prompt for summarizing text.
SYSTEM_SUMMARY_PROMPT="You are a helpful assistant that summarizes content. Be concise, helpful."
# The prompt for summarizing chunks within the podcast transcript.
CHUNK_SUMMARY_PROMPT="Summarize this text, excluding plugs, branding, and promotions. Avoid mention of Day, podcast and Rosary in a Day. Additionally, exclude repetitive prayers such as Our Father, Hail Mary and Glory Be. Retain details of artwork decribed including title, artist (full name), current location of artwork, medium, style, date (year, decade and century) and brief description."
# The prompt for the final summary of all the chunk summaries
FINAL_SUMMARY_PROMPT="Summarize the following text in the voice and style of C.S. Lewis, employing his clarity, moral insight, and rhetorical flair, as if writing directly to an intelligent but unassuming reader. Generate a title as well. Avoid modern jargon, maintain a tone of gentle conviction, and present the ideas as timeless truths. Do not include any meta-commentary, footnotes, or explanations. Retain details of artworks. Start with a level three markdown header
'### AI-Generated Summary: '. Append your generated title (no colons, proper grammar) to the header."
# The meta-prompt to generate image prompt and caption
SUMMARY_IMAGE_META_PROMPT='You are an assistant that extracts visual meaning from text. Given any descriptive input, generate:
1. "image_prompt" - a detailed, visual description suitable for generating an image.
2. "caption" - a brief, expressive caption that summarizes or enhances the image concept. No colons.
Return your response as a JSON object in the format: {"caption": "This is the generated caption", "image_prompt": "This is the image prompt"}'
CAPTION_PROMPT='You are an assistant who extracts visual meaning from text related to Christian art. Given any descriptive input, generate:
1. "caption" - a brief, expressive caption that summarizes or enhances the text using any information you have concerning Christian art the text identifies.
Return your response as a JSON object in the format: {"caption": "This is the generated caption"}'
SUMMARY_ARTWORK_DETAILS_PROMPT="From the following text, extract any described artwork and return a JSON object with:
\"details\": A detailed summary including the artwork's title, artist, date, medium, style, current location of artwork (location) and subject. Fill values with empty string if not specified in the text.
\"filename\": A lowercase alphanumeric string (no hyphens or underscores), 20 characters or fewer, representing a mostly unique filename derived from the title, artist, and date.
\"caption\": A caption (in proper English), no more than 20 words, that uses title, artist, date,location, medium and subject (in that ordered priority).
If no artwork is described in the text, return an empty JSON object {}."

Image Generation Settings

# Whether to auto-generate images
AUTO_GENERATE_IMAGES=false
# Path to image generation script
IMAGE_GENERATION_SCRIPT="openaigenerateimage"
# Deepinfra image generation model
DEEPINFRA_IMAGE_GENERATION_MODEL="stabilityai/sd3.5"
# Fal.ai image generation model
FALAI_IMAGE_GENERATION_MODEL="janus"

Art Downloader Settings

# Whether to auto-download art
AUTO_DOWNLOAD_ART=false
# Art downloader directory
ART_DOWNLOADER_DIR="artdownloads"
# Whether to verify art images
VERIFY_ART_IMAGES=true
# Art verifier script
ART_VERIFIER_SCRIPT="matchimagetometadata.py"
# Art metadata prompt
ART_METADATA_PROMPT="You are an expert on Christian art.
Provided the following details about the artwork '{}'
analyze the image and generate the following detailed metadata in json format (no markdown, no nesting of attributes, all top-level):
\"title\": The title of the artwork,
\"artist\": The artist or artists,
\"medium\": oil on canvas, fresco, marble sculpture, etc.,
\"location\": where the artwork is currently located,
\"date\": the creation year and century,
\"style\": the artistic style or artistic school that influenced the art,
\"description\": Description of the artwork (including visual elements, composition, subject matter, and style).
\"image_color\": Whether the image is in Color, Grayscale, Monochrome, Duotone/Tritone, Sepia,Color-tinted grayscale, Black-and-white etc.
\"watermarked\": Whether the image is watermarked or not.
\"caption\": A caption (in proper English), no more than 20 words, that uses title, artist, date, location, medium and description (in that ordered priority).
\"analyzed\": Whether the analysis was possible or not.
\"comments:\": Your comments other than the fields above and if analysis was possible or not and why.
Do not add any extraneous information that will mangle the json object expected."
# Image content validation strictness
IMAGE_CONTENT_VALIDATION="lenient"
# Vector embeddings model api key
VECTOR_EMBEDDINGS_MODEL_API_KEY="$DEEPINFRA_TOKEN"
# Vector embeddings provider base url
VECTOR_EMBEDDINGS_BASE_URL="https://api.deepinfra.com/v1/openai"
# Vector embeddings model
VECTOR_EMBEDDINGS_MODEL="thenlper/gte-large"
# Whether to look for alternate images
FIND_ALTERNATE_IMAGES=false

Content Documentation Settings

# List of files for generating documentation
CONTENT_DOCS=(
  "RIAY=start.md"
  "January=January.md"
  "February=February.md"
  "March=March.md"
  "April=April.md"
  "May=May.md"
  "June=June.md"
  "July=July.md"
  "August=August.md"
  "September=September.md"
  "October=October.md"
  "November=November.md"
  "December=December.md"
)

The config values can be modified as per your preferences.

Vale

Initialize vale styles by executing the command vale sync. This should download the specified styles in .vale.ini.

Add the daily Youtube video

Execute the script addvideo with the following parameters:

  • video id - the id of the Youtube video

  • caption or title (in double quotes)

Example:

./addvideo 5I2BbalTOPo "Hagar and Ishmael"

Results:

  1. Computes the day of year from the length of the videos.txt file. day of year = (number of lines in videos.txt) + 1 In this case, 10.

  2. Appends the Video id to the file videos.txt present in the root directory.

  3. Generates markdown file Day010.md in the January subdirectory.

    1. This markdown file has a link to the Youtube video.

    2. It also has AI-Generated summary of the podcast.

  4. Generates image file Day010.jpg in the January/jpgs directory.

  5. Appends Day010.md filename to the January/compact.txt file.

  6. Updates January.md file in the root directory with the contents of Day010.md.

  7. Updated files:

    1. ./videos.txt

    2. ./January.md

    3. ./January/compact.txt

  8. Created files:

    1. ./January/Day010.md

    2. ./January/jpgs/Day010.jpg

Add additional Youtube video to a day

When you have to add an additional video to the markdown for that day, you can execute the script addvideotoday with the following parameters:

  • video id - the id of the Youtube video

  • day of year - the day of the year for which the video is to be added

Example:

./addvideotoday 5I2BbalTOPo 21

Results:

  1. Updates markdown file Day021.md in the January subdirectory.

    1. This markdown file has a link to the Youtube video.

    2. It also has AI-Generated summary of the podcast.

  2. Generates image file <videoid>.jpg in the January/jpgs directory.

  3. Updates January.md file in the root directory with the contents of Day021.md.

  4. Updated files:

    1. January/Day021.md

    2. ./January.md

  5. Created files:

    1. ./January/jpgs/<videoid>.jpg

Add today’s sharing

  1. First, add today’s video.

  2. Edit the generated Dayxxx.md file for today.

  3. Paste the sharing text into the file adding appropriate markdown headers as needed.

  4. Save the file.

  5. Execute script genmonth with the following parameters:

  • month index - 1 - 12

  • optional four digit year - 20XX

    Example:

    ./genmonth 01 2025
    OR
    
    ./genmonth 01
    # The year value will be picked from the environment variable YEAR
    

    Results:

    Updates the January.md file with the sharing text added to the Day010.md file.

You can add sharing to other days as well in a similar fashion. Don’t forget to execute genmonth with the appropriate month index for that day.

You can get the month index by executing the following bash command:

date --date="$(date --date='jan 1 + 30 days' '+%B %d,      %Y')" +%m

Decrement the day of year by 1 and substitute it in the command. The preceding gives the month index for day 31.

Merge monthly markdown files into one large README

  1. Edit the stitch.md file provided to include the markdown files you wish to merge.

  2. The file format follows:

    # README
    
    - [RIAY](start.md)
    - [January](January.md)
    - [February](February.md)
    - [March](March.md)
    - [April](April.md)
    - [May](May.md)
    - [June](June.md)
    - [July](July.md)
    - [August](August.md)
    - [September](September.md)
    - [October](October.md)
    - [November](November.md)
    - [December](December.md)
    

    Include or exclude any files you need or don’t need.

  3. Execute the stitch script.

./stitch

Results:

Generates README with all the contents of the listed markdown files in stitch.md.

Generate table of contents for markdown files

Execute the gentoc script as follows:

./gentoc <path to markdown file>

Before executing the script, update the file and place the comment <!-- toc --> and <!-- tocstop --> to generate the table of contents inside these markers.

Results:

Generates the table of contents per the existing headings in the markdown file.

Using the commands utility that accepts text commands in file commands.txt

  1. Install ANTLR4

Use pyenv to install and set up your Python3 environment.

Example of setting up your Python and ANTLR4 environment

pyenv install 3.10
pyenv global 3.10
pip install antlr4-tools antlr4-python3-runtime python-dotenv
  1. Add commands to the commands.txt file.

Available commands:

  • addvideo

  • addvideotoday

  • addimgtoday

  • genmonth

  • lintall

  • stitch

  • gentoc

Note: The addvideotoday and addimgtoday commands need the day of year to be a three digit number. Hence, 1 becomes 001, 20 becomes 020 and 99 is 099.

For simplicity and consistency, the commands wrap their command line equivalents.

  1. Execute the commands.py script.

./commands.py

This executes the commands in order as placed in the commands.txt. If any command fails, the program outputs an error message for that command and executes all following commands.

  1. Example commands.txt file

# example commands
addvideo "abc123456" "Example video" # add example video
addvideotoday "abc123456" 010 # add example video to day 10 markdown.
genmonth 01 2025 # generate markdown for month January, 2025
lintall # lint all the markdown files

The program ignores everything after the # symbol and treats it like a new line character.