Server code exited unexpectedly: 141361e4fa

masikh · November 9, 2023, 7:23pm

I have a background script that takes a url - finds a zipfile from that URL then inserts it into my postgres16 db on neon.tech.

I’m getting the above error where the code changes each time. There’s no traceback in the logs.

The error lacks details. How can I investigate further myself?

I’ve got logs within my script but I don’t know where I can see them…

hugetim · November 9, 2023, 8:31pm

As far as the logs, a background task log is separate from the log of the process that initiated the background task.

masikh · November 9, 2023, 11:19pm

So I added more print statements but I still don’t know the cause of this issue. Is there no way to get more detail like a stacktrace for “Server code exited unexpectedly”

stefano.menci · November 10, 2023, 7:01am

Try with smaller files. Perhaps it’s being killed because it’s using too much memory. No, I have no clue how much is too much, but testing with smaller files may help understanding what’s going on.

masikh · November 10, 2023, 12:12pm

Hi,

Thanks for replying.

So I changed the code to batch inserts instead and flush the sql session in between to prevent memory build up. It lasts way longer but still gets killed. There’s 7 tables being produced (inserted into) and it got through the first (and biggest) one but then died on the second (about 40 minutes in).

Is there a limit to the length of a background task? I thought there wasn’t for personal plans. Is there a way to monitor memory usage?

masikh · November 10, 2023, 10:55pm

So I added even more logs (too many frankly) and suddenly it runs fine but due to the amount of logging it is now a very slow script.

I reduced logging and it started crashing again. I’ve had messages before where it says it may be due to memory but now I’ve chunked whats written into the db that message is gone and instead I randomly get the message in the title. Sometimes after a few seconds sometimes after 30 minutes.

Is there no stacktrace equivalent to investigate further? The error code is really quite unhelpful.

masikh · November 10, 2023, 11:26pm

OK - upon searching the forums it appears this error code is an Anvil specific error.

def xml_to_sql(root: ET.Element, model, element_name: str, data_elements: list, session: SqlAlchemySession):
    transaction = session.begin_nested()
    batch_size = 1000
    print("transactions started")
  
    try:
        session.query(model).delete()  # Clears out the table
        print("table cleared")
        column_types = {column.name: column.type for column in model.__table__.columns}
        probability_threshold = 0.001
        instances_added = 0
        element_count = 0
        for element in root.findall(f".//{element_name}"):
            data = {}

            element_count += 1
            if random.random() < probability_threshold:
                print(f"data element  {element_count} - {element_name}")
            for data_element in data_elements:
                text_value = element.findtext(data_element, default=None)  # Extract text value for the data element
                # Convert the string to the appropriate type based on the model's column
                if data_element in column_types:
                    column_type = column_types[data_element]
                    if isinstance(column_type, Integer) and text_value is not None and text_value.isdigit():
                        data[data_element] = int(text_value)
                    elif isinstance(column_type, String):
                        data[data_element] = text_value
                    elif isinstance(column_type, Date) and text_value is not None:
                        try:
                            # Assuming the date format in your XML is 'YYYY-MM-DD'
                            data[data_element] = datetime.strptime(text_value, '%Y-%m-%d').date()
                        except ValueError:
                            # In case the date format is incorrect or the field is empty
                            print(f"Date conversion error for value '{text_value}'")
                            data[data_element] = None
                    else:
                        # For any other type that doesn't require special handling
                        data[data_element] = text_value
                else:
                    # If the element is not found in column types, use the default value
                    data[data_element] = text_value

            instance = model(**data)
            session.add(instance)
            instances_added += 1
            batch_size += 1
            if instances_added % batch_size == 0:
                session.flush()  # Flush the batch to the database but do not commit yet
                print(f"Flushed a batch of {batch_size} records.")

        # Commit the transaction only if all batches are processed successfully
        transaction.commit()
        print("All batches committed successfully.")

    except SQLAlchemyError as e:
        transaction.rollback()
        print(f"Error occurred: {e}. Rolled back the transaction.")
        raise

    finally:
      session.close()

The code above is always where the issue is.

The output is like this


trying xml_to_sql

transactions started

table cleared

data element 365 - AMP

data element 1018 - AMP

data element 2869 - AMP

data element 3375 - AMP

data element 3998 - AMP

data element 6758 - AMP

data element 7195 - AMP

data element 7447 - AMP

data element 8814 - AMP

data element 9447 - AMP

data element 10511 - AMP

data element 11408 - AMP

data element 14226 - AMP

data element 14367 - AMP

data element 14900 - AMP

data element 15357 - AMP

data element 16497 - AMP

data element 16995 - AMP

data element 17371 - AMP

data element 17735 - AMP

data element 18066 - AMP

anvil.server.ExecutionTerminatedError: Server code exited unexpectedly: 652e9c8fbf

It fails somewhere randomly during the data element count. I’m trying very hard to make this platform work. I even signed up to the professional plan in the hopes increasing CPU/RAM would help but it makes no difference.

jshaffstall · November 11, 2023, 12:04am

Have you contacted Anvil support? Error codes like that are typically something that they need to look up in their logs to see what might be causing it.

masikh · November 11, 2023, 12:34am

My understanding was that email support was a paid service only?

In fairness it’s unlikely that forum members can help with something that requires access to the underlying Anvil logs.

socint · November 11, 2023, 8:22pm

Can I make a suggestion? If you haven’t / aren’t already, try running this locally using uplink and a) see if it errors out and b) if it’s an outside anvil problem triggering an anvil error or, indeed, an anvil specific issue. The anvil team are regularly in the forum, so I wouldn’t worry too much about the paid support issue.

hugetim · November 13, 2023, 11:15pm

Without a paid support plan, you don’t get a guaranteed response time, but I would personally email them about an unhelpful error message like this (especially after you’ve made a good faith attempt to find the issue yourself and asked for help in the forum).

I don’t think so, not for paid plans. But I also haven’t heard of many 40+ minute background tasks. I may be wrong, though.

masikh · November 14, 2023, 5:01pm

Hi hugetim. Thanks for that. I didn’t realise I could email the support. I will try them now.

My background task only takes 4 minutes on my local but I think the added logging and CPU restriction might be slowing down the task.

masikh · November 15, 2023, 11:43am

I’ve now managed to chunk the xml which seems to have shifted the issue along

def xml_to_sql(xml_content: str, model, element_name: str, data_elements: list, session: SqlAlchemySession):
    transaction = session.begin_nested()
    batch_size = 5000
    instances_added = 0
    record_count_before_clearing = session.query(model).count()
    current_record_count = copy(record_count_before_clearing)
  
    try:
        session.query(model).delete()  # Clears out the table
        print(f"table cleared - {element_name}")
        column_types = {column.name: column.type for column in model.__table__.columns}
        
        for event, element in iterparse(io.StringIO(xml_content), events=("start", "end")):
            if event == "end" and element.tag == element_name:
                data = {}
                for data_element in data_elements:
                    child = element.find(data_element)
                    text_value = element.findtext(data_element, default=None)  # Extract text value for the data element
                    # Convert the string to the appropriate type based on the model's column
                    if data_element in column_types:
                        column_type = column_types[data_element]
                        if isinstance(column_type, Integer) and text_value is not None and text_value.isdigit():
                            data[data_element] = int(text_value)
                        elif isinstance(column_type, String):
                            data[data_element] = text_value
                        elif isinstance(column_type, Date) and text_value is not None:
                            try:
                                # Assuming the date format in your XML is 'YYYY-MM-DD'
                                data[data_element] = datetime.strptime(text_value, '%Y-%m-%d').date()
                            except ValueError:
                                # In case the date format is incorrect or the field is empty
                                print(f"Date conversion error for value '{text_value}'")
                                data[data_element] = None
                        else:
                            # For any other type that doesn't require special handling
                            data[data_element] = text_value
                    else:
                        # If the element is not found in column types, use the default value
                        data[data_element] = text_value
    
                instance = model(**data)
                session.add(instance)
                instances_added += 1

                element.clear()
                
                if instances_added % batch_size == 0:
                    session.flush()  # Flush the batch to the database but do not commit yet
                    current_record_count -= batch_size
                    add_to_logs_table(f"Flushed a batch of {batch_size} records - {100-round((current_record_count*100)/record_count_before_clearing)}% done of {model.__tablename__}")
                    batch_size -= 1

        # Commit the transaction only if all batches are processed successfully
        session.flush()
        print(f"last flush for {element_name} completed")
        session.commit()
        print(f"All batches committed successfully - {element_name}")

    except SQLAlchemyError as e:
        transaction.rollback()
        print(f"Error occurred: {e}. Rolled back the transaction.")
        raise

    finally:
      session.close()

Now when I run this I get the Server code exited unexpectedly at the point of session.commit().

I don’t believe it’s a memory issue as I have seen the separate memory error from Anvil before, but the fact that chunking the xml shifted the issue implies that it may be memory related.

Can anyone at Anvil look at the logs and point out the issue? Also I highly suggest that the errors are more specific than this so that users can self diagnose issues.

patricia · November 15, 2023, 4:14pm

Hi @masikh,

Are you using Python 3.10 or Full Python 3?

masikh · November 15, 2023, 4:19pm

Hi @patricia - thanks for responding. I am using Python 3.10 Beta, standard base package. I’ve added SQLAlchemy and psycopg2 both latest version.

ianb · November 15, 2023, 6:10pm

I just finished reading this thread from top to bottom, and you have done everything that I would do, including moving to iterparse().

I am not entirely sure, but my sneaking suspicion is that SQAlchemy here:

Since you are creating a new variable called instance with its own python reference count, then adding it to the session, it is possible that SQAlchemys .flush() is not de-referencing the variable. This keeps it in memory in python even after it is sent to your database, even though SQAlchemy would not normally do this.
Normally it would only retain links to the data in the DB for a rollback or a failure to commit, or its row statuses, etc.

If you want, you could try abandoning the batching alltogether and, since you are already doing iterparse(), try flushing an unreferenced copy of the data to see if that makes a difference.
I can’t say what it will do to overall speed with so many DB insert calls, but you will at least know if it is possible / what the memory problem is.

like:

## Instead of:
# instance = model(**data)
# session.add(instance)
# instances_added += 1

# element.clear()
## Do in each iterated loop: 
session.add(model(**data))
session.flush()
element.clear()

This is not ideal, but it might get you further along with the problem solving.

masikh · November 15, 2023, 6:29pm

@ianb thanks for chiming in.

I would add that before I was using flush() I was getting issues with memory from Anvil on smaller xmls before I would process 50% of the data. Once flush() was implemented the issue shifted to only happening on load of the largest xml which is where I introduced interparse() and the problem changed to being generic (i.e. no mention of memory) and happened on the commit.

This only happens on the commit of the largest xml of the 7 that get processed and only at the commit().

The point I was trying to make was that when the session.add() was not being dereferenced the issue happened much earlier and on smaller xmls.

I’m going to try your suggestion now anyway.

ianb · November 15, 2023, 7:27pm

Now you have me wondering if your issue really is the anvil server swallowing up an error from sqlalchemy

There could just be too much data in memory in your database that has not been written to disk? (so before sending the commit command) …and anvil is misreporting the error raised?

The only reason why I know anything about this at all is that at one of my previous employers I ended up automating a pipeline to ETL large, sometimes several GB sized xml files for automotive fitment data and doing essentially what you are trying to do.

I also remember it being a pita with memory and row commit problems upserting it externally…

masikh · November 15, 2023, 9:12pm

@ianb - The db is fine, I’m barely touching it’s capabilities. I tried it on my local with larger files. In addition the db performance scales automatically and it’s never had to scale at all.

I tried your approach, it takes about 50% longer to reach the same point where again it crashes, this time with the run out of memory message. However I realised it never reached the print just before the commit. When I went back to my previous iteration again it never reached the print before the commit.

What this tells me is that there is a problem after batch_size -= 1 where it is exiting this loop

for event, element in iterparse(io.StringIO(xml_content), events=("start", "end")):

I’m not sure why that would cause an issue and why the out of memory error is only happening if I use the flush for every instance.

ianb · November 15, 2023, 9:23pm

Hmm I am out of ideas I think, I’m guessing your xml is relatively uniform, so I doubt its one particularly large element tree to parse somewhere towards the end of the file… Even if so, then it should have problems with other files regardless of size if they had particularly large elements somewhere within.

So I don’t have anything else, I don’t actually know what your data looks like, but I also try to use a different library wherever I can to handle xml, since I strongly dislike the python builtin one.

Here is what I try to use whenever I can instead: