Including Python Code by Line Numbers

1. Intro

For quite some problems I have a separate python program, which I want to keep separate, and a \(\LaTeX\) or org mode file that explains or contains part of the python file. It is easy to include part of the code by using line numbers, like so

\inputminted[firstline=12, lastline=23]{python}{python_file.py}

However, if I add code above line 12 or add some code between lines 12 and 23, or move the block of code altogether, I have to update the line numbers. As I don't like this type of work, I decided to write some code to solve this problem, for \(\LaTeX\) and orgmode files.

2. Plan given to ChatGPT

As a first step, I drafted a list of requirement that my python program had to satisfy. Then I gave this list to ChatGPT, and got a program that did not work completely, but 80% or so was ok. I was amazed! Here are my initial requirements.

  1. In a \(\LaTeX\) file look up lines with strings that contain inputminted.
  2. Look up a tag that appears in such lines as a comment at the end of a line. For instance, tictoc is the comment at the end of this line \inputminted[firstline=30, lastline=35]{python}{python_file.py} % tictoc.
  3. Also look up the name of the python file after the \inputminted command, here python_file.py.
  4. Open the python file, and look up the line numbers of the code between lines tag with the comments # block tictoc.
  5. Update the firstline= and the lastline= in the \(\LaTeX\) file accordingly.

BTW, as it's easy to copy and move complete lines, I use the same string to demarkate the starting and termining lines; in other words, I don't use comments as # begin tictoc and # end tictoc.

It took a few of additional roundes with ChatGPT, and some additional work on my own, but the final result works nicely for my goals. Once the version worked for \(\LaTeX\), I updated it so that it can work with org mode files.

So, for a \(\LaTeX\) file, tag like this:

\inputminted[firstline=12+, lastline=23]{python}{python_file.py} % tictoc

and for an orgmode file, like this:

#+INCLUDE: "python_file.py" src python:lines "84-110" ## tictoc

Note the intentional double ## to comment the tag.

3. The Code

3.1. The modules

import argparse
import glob
import re

3.2. Finding the tagged line numbers in the python file

This function looks up the line numbers. It strips trailing white space between the python code and the terminating comment tag.

def find_block_lines(python_lines, marker):
    """Return the linenumbers of blocks of python code morked like this:

    # block marker
    import numpy as np

    # block marker

    Strip the lines that end with # block marker.
    Mind that inputminted starts counting at 1 while python starts at 0.
    Therefore we need to add +1 when returning the line numbers.
    """
    indices = [
        idx
        for idx, line in enumerate(python_lines)
        if f"# block {marker}" in line
    ]
    if len(indices) != 2:
        return -1, -1

    # remove the lines with  # block marker string
    start_idx = indices[0] + 1
    end_idx = indices[1] - 1

    # remove trailing empty lines
    while end_idx > indices[0] and python_lines[end_idx].strip() == "":
        end_idx -= 1

    # Add one to convert python index to inputminted index
    return start_idx + 1, end_idx + 1

3.3. Updating a \(\LaTeX\) file

This function looks up the tags mentioned in the \(\LaTeX\) file. In the for loop, it looks up the line numbers of the tagged code in the python file, then updates the \(\LaTeX\) file.

def update_latex_file(latex_file):
    with open(latex_file, "r") as fp:
        latex_content = fp.read()

        pattern = (
            r"\\inputminted\[firstline=\d+, lastline=\d+\]"
            r"{python}{(.*?\.py)}\s*\% ([\w]+)"
        )
        matches = re.findall(
            pattern,
            latex_content,
        )
        for python_file, marker in matches:
            with open(python_file, "r") as py_file:
                lines = py_file.readlines()
                begin_index, end_index = find_block_lines(lines, marker)
                print(begin_index, end_index)
                if begin_index != -1 and end_index != -1:
                    pattern = (
                        fr'\\inputminted\[firstline=\d+, lastline=\d+\]{{python}}'
                        fr'{{{python_file}}}\s*\%\s*{marker}'
                    )
                    replacement = (
                        fr'\\inputminted[firstline={begin_index}, '
                        fr'lastline={end_index}]{{python}}{{{python_file}}} \% {marker}'
                    )
                    latex_content = re.sub(pattern, replacement, latex_content)

    with open(latex_file, "w") as fp:
        fp.write(latex_content)

3.4. Updating an orgmode file

Updating the org file works similarly. However, in the org mode I write the tag after two hashes, like ## tag. I noticed that org mode changes the % in the code for the \(\LaTeX\) files above.

def update_org_file(org_filename):
    with open(org_filename, "r") as fp:
        org_content = fp.read()

        matches = re.findall(
            r'#\+INCLUDE: \"(.*?)\" src python:lines \"\d+-\d+\" ## ([\w]+)',
            org_content,
        )
        for python_file_name, marker in matches:
            with open(python_file_name, "r") as py_file:
                python_lines = py_file.readlines()
                begin_index, end_index = find_block_lines(python_lines, marker)
                print(begin_index, end_index)
                if begin_index != -1 and end_index != -1:
                    pattern = (
                        fr'#\+INCLUDE: "{python_file_name}" src '
                        fr'python:lines "\d+-\d+" ## {marker}'
                    )
                    replacement = (
                        fr'#+INCLUDE: "{python_file_name}" src '
                        fr'python:lines "{begin_index}-{end_index+1}" ## {marker}'
                    )
                    org_content = re.sub(pattern, replacement, org_content)

    with open(org_filename, "w") as fp:
        fp.write(org_content)

3.5. Last steps

I want to be able to call the function on multiple files at ones.

def process_files(pattern):
    for file_path in glob.glob(pattern):
        if file_path.endswith('.org'):
            update_org_file(file_path)
        elif file_path.endswith('.tex'):
            update_latex_file(file_path)
        else:
            print(f"{file_path} is not an .org nor a .tex file")

The main reads the filenames as arguments and has the files updated.

if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description="Update latex or org files based on Python markers."
    )
    parser.add_argument(
        "file_pattern",
        type=str,
        help="Provide a .tex or .org file, or *.tex/*.org",
    )
    args = parser.parse_args()

    process_files(args.file_pattern)