Ok the solution of mine don’t work. I’m running out of memory by performing this loop. It happens after 20.000-22.000 rows
I make chunks from ma list of dicts in 2000 steps.
I do some processing and create a new dict to pass the selected data to add_rows. (adding unique ID from different function, changing str date into date)
My list of dicts has even more data that I will add later to different database and linkt it with the rows created with the code below. So I already fail in the first phase.
#start imporatation preparation
id_result_index = 0 # Index to track the current result ID
# Determine the number of chunks needed
chunk_size = 2000
num_chunks = math.ceil(len(list_of_dicts) / 2000)
i=0
# Process the list in chunks
for i in range(num_chunks):
start_index = i * 2000
end_index = min(start_index + chunk_size, len(list_of_dicts))
chunk = list_of_dicts[start_index:end_index]
#make final changes to a list of dicts: add sample index and result index, convert string date to a date, add application link to a database
final_result_list = []#for keeping the final result list for import
for row in chunk:#Iterate over the list of dictionaries
application_value = row.get('application')
if application_value is not None:
# Search for rows with matching 'application_code' in the database
found_rows = app_tables.assay_databank.get(application_code=application_value)
# Update the corresponding value in the dictionary with the value from the found row
row['result_id'] = result_ids[id_result_index] # Assign the ID value
id_result_index += 1 # Increment the index to move to the next ID
if found_rows:
row['application'] = found_rows#overwrite the value with row for linking
else:
some_failed.append(f"- application not found {application_value}")
row['application'] = None# overwrite the missing application, to avoid data mismatch
#convert str date into date format
measurement_date_str = row.get('measurement_date')
if measurement_date_str is not None:
# Convert the string date to a datetime object
measurement_date = datetime.strptime(measurement_date_str, '%Y-%m-%d').date()
# Update the 'measurement_date' key with the converted date
row['measurement_date'] = measurement_date
#create a separate list of results including unique sample ID, Results ID, user and datetime
if application_value is not None:
new_item = {
'sample_id': row.get('sample_id'),
'result_id': row.get('result_id'),
'application': row.get('application'),#already a row to a database
'concentration': row.get('concentration'),
'unit': row.get('unit'),
'device': row.get('device'),
'measurement_date': row.get('measurement_date'),#date format
'updated_by': user,
'updated_date': datetime.now()
}
final_result_list.append(new_item)
app_tables.results_database.add_rows(final_result_list)
Ok, I rewrote the whole code just to improve the performance. What was causing running out of memory were single database query for each row.
My solution was:
using slicing to create chunks
Removing all the database query to make one single query: dict of keys + values. Where the key is the column name in database and value searched unique id generated from code. app_tables.database.search(**dict_search)
and then in the final preparation of a list of rows (final_sample_list) for the import assigning the link to the other data table based on the new list of rows from a single query
for row in chunk:#assign results to the sample
# Find matching rows based on the value of row['barcode']
matching_rows = [found_row for found_row in found_rows if found_row['sample_id'] == row['barcode']]
# Assign the list of matching rows to row['results']
new_item = {
'barcode': row['barcode'],
'study_id': row['study_id'],
'name': row['name'],
'status': row['status'],
'matrix': row['matrix'],
'ext_barcode': row['ext_barcode'],
'volume': row['volume'],
'comment': row['comment'],
'order_number': row['order_number'],
'location_storage': row['location_storage'],
'box_bag': row['box_bag'],
'position': str(row['position']),
'results': matching_rows,
'created_by': user,
'created_date': datetime.now(),
'updated_by': user,
'updated_date': datetime.now()
}
quick_link_samples.append(row['barcode']) # Append quick links for the quick search in samples with objects
final_sample_list.append(new_item)
# Import samples for the current chunk
app_tables.sample_database.add_rows(final_sample_list)
```