Showing posts with label Uncertainty. Show all posts
Showing posts with label Uncertainty. Show all posts

Thursday, July 23, 2020

COVID-19 models including reopening and second wave

Reopening of many countries has begun and testing rates have also ramped up.  Some countries are now better able to detect infections and the age profile of infected people has in many cases shifted towards younger people.  Some countries have the outbreak under good apparent control while others have not yet seen even the first peak.  

We have updated our models for Ireland, the US and Italy to include the latest ECDC data.  On the basis of our modeling assumptions, and in the absence of further closures/ measures, Italy may not see a second wave for a considerable period.  Ireland may see this second wave sooner (and authorities will act to control it).  

A population (whole country) based model of any country is an approximation, as there are always localized effects, especially when applied to large countries like the US.  For example, cities in the North East had large first waves and are having relatively small second waves on reopening.  Cities in other regions had small first waves that never really fell away and are now seeing the effects of those waves continuing to build after somewhat earlier reopening.  Viewed as a whole, the US clearly has a second wave, but for the above reasons that terminology is not necessarily accurate locally.

Model fit to death rates in the US as a whole, up to 23 July 2020.  Model predictions indicate that a second peak in death rate has started and tightening of measures is required to limit its severity.

We have included in the latest models parameters to reflect increased detection by testing (as positive test rates are reducing in many nations) and also reduced mortality as the age profile shifts.  Those extra parameters add uncertainty but are necessary in order to continue to describe the evolving situation well.  On the basis of the model and its prior versions referenced in previous posts, parts of the US will need to continue to take all possible measures to limit the spread of COVID-19 in order to prevent a significantly larger death toll.

Sunday, June 14, 2020

COVID-19 second wave risk: models for Ireland, Italy, US and Singapore

We've taken another look at ECDC data to date and updated the models for the US, Italy and Ireland with data to 13 June.  We've also added a model fitted to Singapore data.

To date, the same model structure has fitted every region and inferred rates of physical contact between those infected with the disease and the broader community have tallied qualitatively with mobility data from Apple and Google.  That continues to be true for Ireland, Italy and the United States.  

Now that many countries are easing restrictions, the risk of a second wave of infections is growing.  In the case of the US, model parameters are tending to suggest that a second wave could be nearer than in the other regions.  Every feasible way to reduce transmission should be considered and it is possible that relatively small measures such as widespread wearing of masks could make the difference between a manageable and unmanageable second wave.

We've added some factors to the scenarios tabs of the models that define how restrictions are lifted.  That makes it easier for you to change those (in Simulator>Set Parameters) and you can use them to make response surfaces.  We took Ireland as an example and estimated the number of reported deaths at end 2020 for a variety of 'new normal' levels of contact (m_normal) and periods over which we adjust from lockdown to those levels (t_normal).  
Response surface (contour plot) for projected Year End 2020 reported deaths in Ireland from COVID-19, as a function of 'new normal' levels of movement / transmission (m_normal) relative to baseline and the length of time over which we move from lockdown to new normal (t_normal).  While there is some sensitivity to t_normal, the main sensitivity is to m_normal (vertical axis).  Every reasonable step should be taken to keep this value as low as possible.
Here's hoping that our greater knowledge about COVID-19 puts us in a position to avoid the less favourable regions on this diagram.

We added the Singapore model at the request of a customer there.  Singapore was highly praised in the early stages of the outbreak, with a small number of cases, extensive testing and vigilance and a very low mortality rate from COVID-19.  However an outbreak in dormitories used by migrant workers led to a spike in cases and a period of lockdown referred to as the 'circuit breaker'.  Mortality rates remain low and this is attributed to both compliance and the young age profile of the majority of cases / migrant workers.  To fit the data, we had to allow for a burst of increased contact (the breakout in dormitories) and the model now fits the data well.

All models are available here as usual.  All you need to run them is the Excel file and Dynochem.

Sunday, May 10, 2020

COVID-19 exit strategies from lockdown

We have been using a 'mixing' variable to represent the effects of non-pharmaceutical measures / interventions on the rate of exposure to COVID-19:
The rate of change of the number of people currently exposed depends on i) the relative contact rate between infectious and susceptible and ii) the probability of transmission during contact; these are reflected together in the 'mixing' term
This parameter reflects the relative contact rate between infectious and susceptible and also the probability of transmission during contact.  Baseline mixing = 1.0, i.e. what we did before the outbreak.  To date, this variable has been reduced by lockdowns and closures (reducing our contact rate) and physical distancing, handwashing, mask wearing (reducing the probability of transmission).  When we are exiting lockdowns, our contact rate will return gradually towards the baseline but if we maintain or improve current standards of physical distancing, handwashing and mask wearing, the mixing term in the model will not return to 1.0.  This will be very important in preventing large subsequent waves of infection.

As noted in last week's update, Ireland, Italy and the United States have reduced infection rates during lockdown by at least two thirds.  Some of the benefit came from reduced contact and some from reduced probability of infection; at this time it is difficult to split the effects accurately; however it is likely that the reduced probability accounts for a significant portion of the effect.

In addition to including the latest data in our models this week, we have updated the lines (on the process worksheet) that allow simulation of the effect of relaxing the measures that reduced contact.  Most countries will relax measures over a period of several months starting soon, while maintaining focus on hygiene and distancing and in many cases encouraging the wearing of masks.  Our models currently simulate a linear relaxation of measures, though in practice these will occur in 'steps' or 'phases'.  We also include the option of a return to current lockdown for a short period at any time.  Users can define the ultimate 'relaxed' value of the mixing variable and to begin with we have set this to 45% of baseline.  We chose this value in the hope that measures affecting probability of transmission will be maintained and that they are highly effective; and that as many people as possible will be able to continue to work from home.

Example projections are shown below for the US as a whole, with linear relaxation of lockdown measures to mixing = 45% of baseline (inset) over a period of months.  Even with these potentially optimistic parameter values, a very large second wave of infections could occur before the end of the year.  It seems likely therefore that in order to control the level of infection, it will be necessary to have periods during which we return to current levels of lockdown.
Model fit to known cases and reported deaths to date for the US (symbols and curves of the same colour).  Then projecting to end 2020 with a linear relaxation of restrictions on movement [inset], to a new stable mixing level of 45%.

Sunday, May 3, 2020

COVID-19 models at 3 May Ireland Italy and US

Our models have been updated to include the data from Ireland, Italy and the US for the week just ended.  Trends and parameters are stable in all regions.  We do not adjust R0 in any region (this was fitted to the early 'exponential' data) but we refit the 'mixing' profiles each week and values have been steady for some time.  Since 11 April, the models also predict death rates and the related fitted parameters are also steady.

Of the three regions, R0 may have been highest in the US (3.92) and lowest in Italy (2.94), with Ireland in between (3.52).  We estimate that R_effective (or Rt) after adoption of restrictions is now 1.1 in the US, 0.98 in Italy and 0.84 in Ireland.  These figures translate to mixing levels of 28%, 33% and 24% of baseline respectively in the periods of most severe 'lockdown'.

In order to predict the safest and most beneficial ways to release restrictions, data on the incremental effectiveness of each separate measure (e.g. social distancing, wearing masks, cocooning the vulnerable, staying at home, closing shops and places of work) would be extremely useful but are not yet available.  We may learn these parameters as the outbreak progresses and be better able to predict exit strategies.

In the meantime, Google and Apple are providing anonymized data related to movement, recorded when our devices talk to their servers.  In general, these data trend in line with the mixing percentages we estimate from the predictive model.  Results are shown below for the United States as an example. Though the mobility data are scattered, the trends align with the predictive model and indicate that mobility data are helpful indicators of progress.

Mobility data from Apple devices compared to the mixing variable fitted to COVID-19 case data for the US.  Though the data are scattered, the trends align with the predictive model.

Sunday, April 26, 2020

COVID-19 models at 26 April Ireland Italy and US

Our predictive models for Ireland, Italy and the US are stable and identical for each region, differing only in fitted parameter values (e.g. R0 and people movement versus time).  As the progress of COVID-19 is slowed by restrictions, the fitted parameter values for people movement have become more consistent from week to week.  Models with data to 26 April 2020 are available here as usual.
Good progress is being made and restrictions will be gradually loosened and occasionally re-tightened over the coming weeks and months, depending on trends in case numbers. 

Data for Ireland remains harder to interpret because of delayed testing results and some changes in the basis for reporting deaths this week.  A graph showing backdating of rest results was shared by the government on 23 April 2020 and we have used this to update the Ireland model; however there are spurious peaks (see below) and in addition, focused testing in care/nursing homes generated high case numbers this week that probably do not reflect a trend.  We therefore did not do a tight fit to the Ireland data.
Known case data for Ireland reported by the ECDC and a backdated version of the same information published on Thursday this week.  Both versions contain spurious trends that make direct use for parameter estimation difficult.
Approximate model fitted to backdated Ireland data indicates that the number of infectious in the first wave probably peaked around 1 April 2020.
Over the next weeks we will explore developing a more granular predictive model for the loosening of restrictions, especially focusing on the degree to which communities can return towards normality while those especially vulnerable remain protected. 

Sunday, April 19, 2020

COVID-19 at 19 April Ireland Italy and US

Each of the regions we have been modeling continues to make good progress against COVID-19 and some relaxation of the restrictions on people movement are being discussed or applied carefully with a view to avoiding second and later waves.

The numbers of infectious appear to have peaked clearly in Italy and the US.  The Ireland data has been harder to read because of around 3000 positive test results delivered three weeks late;  without a definitive restatement of the correct dates for those case numbers, it is difficult to be confident of the Ireland parameters at this time.

Models with current data are available as usual here.  Sample results are shown below.  As usual in the graphs below, discrete symbols are measured data (cases and deaths) and curves of the same colour are model predictions of those data.

Italy:

Predictive model fit to known cases and reported deaths from Italy indicates a peak in the number of infectious in mid March.

US:

Predictive model fit to known cases and reported deaths from Italy indicates a peak in the number of infectious at end March.

Sunday, April 12, 2020

COVID-19 models for Ireland, Italy and US - 11 April

Thanks again for all the positive feedback on this work.

In this week's update we have included data to 11 April for each region.  The structure of the predictive model is the same as before, with the additional calculation of deaths from the outbreak now included.  We have also simplified and we hope improved the workflow for application to other regions, by replacing the manual adjustment of an 'imposed' population mixing profile with use of the Dynochem Fitting window to fit a 'piecewise-linear' mixing profile to all case data (second scenario in the model).

The models (available here) fit very well to both cases and deaths in each region (first scenario for initial period and second for the outbreak to date).  The now usual caveats apply about known coronavirus case numbers: they lag and obscure real case numbers and their meaning varies depending on the testing criteria, volume and delay in each region and over time. Case data for Ireland has been muddled in recent days by the addition of results from swabs taken over the last month that took several weeks to test; we have attempted to reconcile this unhelpful number over the relevant period and refitted parameters for Ireland on that basis.

Each region's degree of control over the outbreak may be assessed by the current effective 'R', taking account of its original value (R0) and the restrictions.  R_effective should be near or preferably below 1 before restrictions can be safely relaxed for a short period (when it will rise).

The are many 'peaks' in such an outbreak and to say 'it has peaked' requires a more specific definition of the type of peak.  While infection rate may already have peaked (in the first wave), detection lags that, so peaks later; death rates also lag and peak later again.

Various peaks that occur in an outbreak like COVID-19, using model data for Ireland as an example.  Natural time lags as well as testing delays may cause peaks in infection, detection, death (on secondary y-axis) and ICU bed occupation rates to occur at different times.

As usual in the graphs below, discrete symbols are measured data (cases and deaths) and curves of the same colour are model predictions of those data.  'Mixing' reflects the reduced interaction of the population with each other.

Ireland:

Ireland: Including the burst of old case data results received this week, Ireland looks to have a little further to go before the first wave of the outbreak places peak demand on health services.  In order to bring that peak into April, the public will need to observe restrictions more tightly than at present.  Estimated current R_effective=1.1.  The peak infection rate may have occurred around 3 April.

Italy:

Italy: The first wave of the outbreak has recently passed the point of peak demand on health services. Peak infection rate was probably around 19 March. Estimated current R_effective=0.97.

United States (as a whole):

US: Peak infection rate may have occurred in early April.  Peak demand on the health services could be in late April or early May.  R_effective appears similar to Ireland at approximately 1.1.

Saturday, April 4, 2020

Coronavirus projections for Ireland, Italy and US

It is generally accepted that reported / known COVID-19 cases represent only a fraction of real cases; we continue to adopt a figure of about 22% known cases cumulatively for the outbreak as a whole.  Reported case numbers also depend on the criteria for testing, the volume of tests, the time taken to produce a test result and test accuracy;  all of these factors are in flux to some degree in most regions.  However known case data remains the best information that we have to assess the state of the outbreak and we continue to use those data in the models today.

Our model 3 (available here) tracks the population and predicts the numbers who have been exposed, infected, isolated and cured; it combines the (testing) time lag of detection relative to infection that we used in model 1, with the 22% cumulative detection rate we used in model 2 and represents a small evolution that we think improves it's correspondence with reality.  We see this as a fit for purpose model and future postings are unlikely to change the model much, only to apply it to the latest data.

Results to date and projected to the end of 2020 are summarized below for Ireland, Italy and the United States.  Projections support the view that the month of April may see a peak in demand for ICU beds in each of those regions if the current restrictions are maintained and observed by the public.  Note that in each case, the estimated number of ICU beds needed is based on the number of known cases currently in isolation and assumes that 5% of those require an ICU bed.

The question of how to emerge from the pandemic is difficult, with indications that general restrictions similar to those currently in place cannot be relaxed much for the foreseeable future.  As usual in the plots, discrete symbols like 🔺indicate measured / real reported case data (denoted in the legend as 'Exp' for 'experimentally measured') and curves indicate model predictions.  When curves pass through symbols of the same colour, the model agrees well with measured data.  The 'mixing' variable [inset] is an indicator of the extent of people movement, with 1.0 as the base case (before restrictions) and lower values after restrictions have taken effect.

Ireland:

Ireland: If current restrictions remain in place [inset] and are observed by the public, the model suggests that peak ICU demand will be reached around day 63, between 20-27 April.  Relaxation of restrictions [inset] will lead to further peaks later in the year.

Italy:

Italy: If current restrictions remain in place [inset] and are observed by the public, the model suggests that peak ICU demand will be reached near day 61, around 13 April.  Relaxation of restrictions [inset] will lead to further peaks later in the year.

Unites States (as a whole):

United States: If current restrictions remain in place [inset] and are observed by the public, the model suggests that peak ICU demand will be reached around day 93, between 20-27 April.  Relaxation of restrictions [inset] will lead to further peaks later in the year.

Saturday, March 28, 2020

Coronavirus projections: model 2

Thanks for the positive response to our initial work on a COVID-19 prediction model.  Here are the results of a second iteration in model development.  If you are a Dynochem user, you can also download and run the model from our COVID-19 site.

A challenge with predictions for the outbreak is that we only have reported/ known case data for parameter estimation.  The first model assumed that all infected cases would ultimately be known cases and that detected cases were equivalent to reported known case data.  In our updated model today, detected cases are no longer simply a lag of infected cases but use the fact that a significant portion of infected cases are never reported in known case data.

The fraction that defines this relationship was estimated in recently published analysis of the outbreak in China; using a complex model that included movement between Chinese cities, known cases were estimated to represent only 14% of the total; however they also estimate that the ability to transmit the disease in unknown cases was 55% of that in known cases.  It is difficult to see why an unknown case would be less infectious.  If we assume that all infectious are equally infectious (on average) whether their case is known or not, this leads to an estimate that about 22% of cases are known based on the China data.  We have also taken estimates of the incubation period and infectious period from the China data.

To account for the above treatment of known cases for parameter estimation, the predictive model now includes the following elements:
This is similar to what is known as the SEIR model, except that here the 'Recovered' compartment is split into Isolated and Cured, for the purposes of estimating ICU capacity requirements. We have not included mortality calculations. The concept of known cases (cumul_detected) is linked to the number of infected by the above fraction (test_frac=22%) and the time from test request to test result.  The 5% requirement for ICU beds is 5% of known cases.  Model assumptions are noted on the Process Scheme worksheet together with some notes about possible future changes.  The model includes a Notes tab explaining how to apply it to a specific region.

We have applied the model to both Ireland and Italy today and results are shown below.
Model 2 fit for Ireland data, with timeframe to end March 2020 (Day 0 = February 20).
Curves are model predictions and symbols are observed data.  Plot units are in brackets on the legend.
There is a clear indication of how measures have changed the trend in the numbers Exposed and Infectious.
Potential impact of on/off restrictions in movement for Ireland to end December 2020.  With further limits on people movement [inset], peak ICU needs could be reached in April or June, depending on restrictions.
The (future) predictions in both cases for Ireland and Italy are optimistic in that they assume a further tightening or increased effectiveness of social distancing / restrictions, compared to what we have been able to achieve to date, plus repeated application of those measures for months ahead. There is evidence that Italy has reduced contact to 40% of base level, while Ireland has reduced contact to 50% to date; these levels are indicated by the 'mixing' variable in the model (see inset plots).  The economic and other costs associated with continuation and deepening this regimen (e.g. to 20% or 10% contact) may not be sustainable.  To generate additional predictions with alternative timing and extent of measures, download the model and run this scenario with your own inputs.
Model fit for Italy data, with timeframe to end March 2020 (Day 0 = February 12).
Curves are model predictions and symbols are observed data.  Plot units are in brackets on the legend.
Potential impact of on/off restrictions in movement for Italy to December 2020.  With further limits on people movement [inset], peak ICU needs could be reached around the end of April (day 77 for Italy).  Relaxation of restrictions even for a short period could return demand towards peak levels.
To drill down further and run your own scenarios, download the model and run it.

In addition to challenges interpreting known case data, the number of data available are also limited, therefore model parameters are uncertain and predictions are indicative only.  In particular, the R0 parameter we obtain by parameter estimation to known case data is 3.0 for Italy and 3.55 for Ireland, both higher than values generally quoted for China.  This could reflect differences in social contact patterns, age profile or the importation of cases due to air travel, which are not explicitly included in our model.  Both values being higher than China may be an artifact of having too few data at present and makes the current model predict more severe impacts.

Monday, March 23, 2020

Coronavirus projections

We built a model over the last week using our Dynochem platform that analyzes reported case numbers and projects forward to the peak(s) of the outbreak, allowing interactive exploration of the effectiveness and timing of measures that could be implemented.  This model may be applied to the specific situation in any country or region, by fitting two parameters to case data for that region from the ECDC. You can get a copy of the model here; to run it, you'll need Dynochem installed.  In recent days some nice online simulators have also appeared and here is one example.

The current Dynochem model tracks the number of susceptible, infectious, isolated and cured patients versus time.  Detected cases are tracked, with a time lag after infection that reflects both the induction period and rate of testing.  The rate of growth is fitted to regional data for detections (cumulative).
Schematic of current Dynochem model for Covid 19 outbreak [click to expand]
The effect of reduced movement/ contact of citizens is included as a mixing parameter, ranging from 1.0 with free movement to 0.0 with no movement at all.  [Because infection behaves like an un-premixed chemical reaction, classical chemical engineering concepts like intensity of segregation are relevant and the rate of reaction depends quite linearly on the number of infectious people that are moving around/ mixing.]

The Dynochem model assumes that all detected cases are isolated after detection and are no longer able to infect others; this is the current policy of health authorities.  However the number of actual infected cases in the early period can be many times more than those detected, so most of the infectious may be moving through the community when no restrictions are in place.

Parameters estimated from case data are i) the initial number of infectious cases 10 days before the first detected case and ii) the kinetic constant for growth of the outbreak during the exponential phase.  The predictions should be taken as indicative and useful for planning rather than definitive.  One can argue about methodology and assumptions and we may be able to take a more definitive approach when more data become available.

A typical fit to detection data (for Ireland in this case) is shown in Figure 1.
Figure 1: Parameter fit to case numbers.  Curves are model predictions versus time; symbols are measured data [click to expand]. 
Future case numbers can be predicted as shown in Figure 2, for a scenario in which people movement is unrestricted (worst case) and an example period of about 150 days.  The peak number of infectious (blue curve) could have been over 1.3 M without restrictions.
Figure 2: Worst case projected numbers of infectious people (blue) versus time, based on early case data and without restrictions on the movement or people [click to expand].
'R0' terminology from the field of epidemiology is used to characterize the contagiousness of the spread. We explored the sensitivity to this variable [a property of the disease] as well as the degree of social contact / mixing [a property of how we respond].  In epidemiology these are often multiplied to give an 'effective R'.

Simulations like that in Figure 2 may be run many times over with different inputs; an example result from a series of about 300 such 'scenarios' is summarized in Figure 3.  The parameters varied are the contagiousness (R0, on y-axis) and the degree of people movement (mixing, on x-axis).  The example response plotted as a heat map is the number of people that would ultimately be infected.
Figure 3: Contour levels / colours indicate the number of people that could be infected, versus contagiousness (y-axis) and people movement (x-axis) [click to expand].
Figure 3 indicates, taking the fitted R0=3 (at bottom of plot) as the most likely case, that most of the population could be infected in a worst case scenario (mixing=1).

Additional results available include the timing of the peak (or peaks) in case numbers and the number of hospital beds required to accommodate patients.  The mixing variable can also be imposed as a profile versus time, so that various on/off strategies for people movement can be considered.  For example Figure 4 shows what may be an optimistic simulation of the effect of two periods of almost fully restricted movement, designed to reduce the number of infectious to near zero.
Figure 4: Infection rate and beds needed (assuming 5% of isolated patients require hospitalization) when two periods of very restricted movement are applied several months apart [click to expand].
There is evidence from Ireland and several other European countries that recent restrictions are starting to slow the rate of infection.  The model indicates that restrictions of varying degrees may be required over an extended period.

We intend to update the model and its predictions here periodically with new data.

Monday, February 3, 2020

2019 round-up and looking forward to 2020

To ardent watchers of this blog - you know who you are - apologies for the pause in postings.  A lot has been happening since August 2019 as we follow our mission to accelerate process development and positively impact the development of every potential medicine.

We'll be posting more regularly in 2020, with lots of news and new capabilities in the pipeline.  In the meantime, here's a catch-up on some items from the latter part of 2019 and a picture that summarizes a few of them:

A few Scale-up Systems and Scale-up Suite highlights from the end of 2019; more details below.
  • We presented our sponsored AIChE award for Outstanding Contribution to QbD for Drug Substance to the 2019 winner, Zoltan Nagy of Purdue University, at the Annual Meeting in Orlando [pic - top right]
  • We nominated John Peterson of GSK for the corresponding Pfizer-sponsored Drug Product award, for his excellent work on statistics of design space, and he won.  Was great to catch up with John at the awards session [pic - top left]
  • Andrew Bird presented statistically rigorous calculations of design space for three common unit operations; and ways to dramatically accelerate the calculations [pic - top centre]; watch for the details in a 2020 webinar
  • A growing band of Dynochem and Reaction Lab users are keeping warm this winter in our 'beanie' hats, helping the environment with our keep-cups and looking forward to summer in our poloshirts [pic - bottom centre]
  • And we've updated our certification programs to a fully automated on-line system with randomized questions.  Visit the Resources site and search for 'certified' to find out more and take the test.

Thursday, December 21, 2017

Congratulations to Dr Jake Albrecht of BMS: Winner of AIChE QbD for Drug Substance Award, 2017

At AIChE Annual Meetings, Monday night is Awards night for the Pharma community, represented by PD2M.  This year in Minneapolis the award for Excellence in QbD for Drug Substance process development and scale-up went to Dr Jake Albrecht of Bristol-Myers Squibb.  Congratulations, Jake!


Winners are selected using a blinded judging panel selected by the Awards Chair, currently Bob Yule of GSK.  Awards criteria are:
  • Requires contributions to the state of the art in the public domain (e.g. presentations, articles, publications, best practices)
  • Winner may be in Industry, Academia, Regulatory or other relevant working environment
  • Winner may be from any nation, working at any location
  • There are no age or experience limits
  • Preference is given to work that features chemical engineering
Jake was nominated by colleagues for:
  • his innovative application of modeling methodologies and statistics to enable quality by design process development
  • including one of the most downloaded papers in Computers and Chemical Engineering (2012-2013), “Estimating reaction model parameter uncertainty with Markov Chain Monte Carlo
  • his leadership and exemplary efforts to promote increasing adoption of modeling and statistical approaches by scientists within BMS and without
  • his leadership in AIChE/PD2M through presentations, chairing meeting sessions, leading annual meeting programming and serving on the PD2M Steering Team
Scale-up Systems was delighted to be involved at the AIChE Annual Meeting this year in our continued sponsorship of this prize.  Some photos and video from the night made it onto our facebook page and more should appear soon on the PD2M website.

Jake is also a DynoChem power user and delivered a guest webinar in 2013 on connecting DynoChem to other programs, such as MatLab.

Thursday, June 2, 2011

Marcello Bosco of Roche, Basel wins prize for Best Presentation at DynoChem 2011 User Meeting, London

“We did promise you a blast of U2 and I think you’ll agree that it has, indeed, been a Beautiful Day.”  And the audience certainly agreed with Joe Hannon as the discussions continued and new ways in which to use DynoChem as it develops its capabilities were suggested.

Joe then outlined some of the exciting new developments that DynoChem users can expect in the future.  “Thanks to Steve Hearn and our Development Team, we’ve made DynoChem easier to roll out and made it quicker and simpler for your IT colleagues to manage".  


Forthcoming initiatives include a new interface in the DynoChem Resources website using Microsoft Silverlight to provide easier search and navigation.  In the meanwhile Joe urged users to visit DynoChem Resources to find out more about other ways in the software can help speed experimentation and reduce costs.

Not surprisingly given the quality of the day's presentations, finding a winner of the prize for best presentation was tough and when customer votes were counted, at first it seemed that there had been a three-way tie for first place.  However a quick recount revealed that Marcello Bosco of Roche Basel (pictured above) was the outright winner with his discussion of "Thermal Scale-up – Vessel Characterization and Reaction Modelling with DynoChem".


Amid envious glances from colleagues, Marcello walked away with an iPad 2.  Great for tapping out ideas when inspiration strikes outside the lab!

Joe ended with a call to action.  “We’re proud to be sponsors of an AIChE award for excellence in quality by design, along with Pfizer and Merck and I’d like to ask you to get in contact if you know of any additional candidates for the 2011 prize, people who are doing great work that should be recognised." 

When it comes to achieving these excellent results, DynoChem will be there, playing its part.

Monday, May 16, 2011

Jerry Salan of Nalas Engineering wins prize for best application presentation at DynoChem 2011 User Meeting

Congratulations to Jerry Salan of Nalas Engineering, who won the prize for Best Application Presentation at the DynoChem User Meeting in Chicago last week.  Jerry presented on “Pilot Scale Design and Continuous Manufacture of Novel Explosives Using Kinetic Modelling” and gave a great illustration of how to design a flow process quickly using a kinetic model and use the results to predict scale-up. Some even referred to Jerry's talk as 'DynoMite'.


Paul Thomas of PharmaQbD.com reported on the meeting and you can find out more here.
More pictures of this, the Mumbai and London meetings will be available shortly.

Tuesday, December 21, 2010

Registration open for DynoChem User Meetings 2011: Chicago, London and Mumbai

Registration has opened for the DynoChem 2011 User Meetings, taking place in Chicago, London and Mumbai, where pharmaceutical companies will share 'Recipes for Success' in April and May 2011.  More details at http://www.scale-up.com/UGM11.html.

Monday, December 6, 2010

Population balance modeling in crystallization - discussion

There's a LinkedIn discussion going an at present on population balance modeling in crystallization.  I posted there today as follows: 

Building a population balance model / framework is quick and easy, especially if you start from a readymade template. Making the model fit your data is more time-consuming and the quality of fit (and confidence ellipsoids) may be reasonable, though the relevant 'kernels' may be too simple or too averaged to fit really well. Making accurate predictions with your model, especially with a change of scale and equipment configuration, adds a further layer of effort to be successful, as the balance among the important phenomena may shift away from your original conditions. In some industries, this level of effort may be indulged; in the fast moving pharmaceutical development project, it usually will not, depending on the phase of development and the questions that need to be answered.

Common sense tallies with our experience in this area, that users need to understand / model first the mass balance, then the energy balance and then possibly the number / population balance. In many projects, trying to start at the wrong end (the population balance) only reveals that the mass and energy balance are not understood. On the other hand, starting with mass (concentration, addition rate, solubility) and energy (temperature, cooling / evaporation rate) alongside a basic evaluation of equipment characteristics (agitation, solids suspension, heat transfer resistance) often leads to insight that solves problems without requiring a population balance approach.

In these more routine projects, on-line size data (such as FBRM) are useful as a diagnostic and to provide trend information.

Thursday, June 4, 2009

Ingredients for a design space based on probability of success

Previous posts have referred to work by DynoChem and others to provide tools to quantify uncertainty in model predictions and translate that into the (joint) probability of successfully meeting several specifications, such as CQAs, at a particular set of processing conditions (factors, or process parameters). The question of how best to calculate this probability, for any process model and set of experimental data is not straightforward to answer.

Many readers will be at least casually aware of alternative schools of thought in the statistics community, namely 'frequentist' - the statistics that most of us learned in school and university and use to a degree every day and 'Bayesian'. The former calculates probability from the frequency of observing a certain outcome; the latter refines an initial subjective estimate of probability (the 'prior') using new information from observations. Good discussions of these alternative approaches are available all over the web and elsewhere; e.g. http://www.rasmusen.org/x/2007/09/25/bayesian-vs-frequentist-statistical-theory/; and for a longer read http://nb.vse.cz/kfil/elogos/science/vallverdu08.pdf.

Whatever about the specifics and relative merits of these approaches, both provide useful insight for design space development by taking explicit account of uncertainty and risk in a multivariate system and published examples of both, as well as their inclusion in regulatory filings, will become increasingly common. Members of DynoChem Resources can access knowledge base articles and other useful materials in this context.

In this posting I am concerned with what goes before the probability calculations; specifically the modelling effort and data to support it. Unless the underlying data and modeling are sound, probability calculations, however advanced the calculation procedure, will have little or no meaning.

With the emphasis on chemical reactions in API synthesis (e.g. final step) and after the solvent, catalyst and reagents have been selected, important ingredients in the mixture, whatever statistical approach is ultimately used are:

1. upfront thinking on a mechanistic basis to determine factors and settings for initial screening experiments; supported by prior data if relevant data exist (see previous posts on process schemes);
2. screening experiments in which the process is followed by taking multiple samples; some of these experiments should screen for physical rate limitations and aim to determine whether physical or chemical phenomena are 'rate-limiting';
3. characterization experiments, in which factors affecting the limiting phenomena are studied across a range of settings; the extremities and some centre-points (with replication) may be adequate for a mechanistic model; a larger set of experiments may be required using a statistically designed (DOE) program of experiments; responses Y are measured as a function of factors X;
4. a modeling effort alongside 3 in which the relationship between Y and X is captured in either a mechanistic or DOE model, or both; the lack of fit and other statistics relating to model uncertainty are quantified; further experiments to reduce uncertainty may be merited and/or improvements in the experimental or analytical technique; data from a portion of experiments should be used for model development and the remaining experiments for model verification; ultimately a single model should fit all of the reliable data; the mechanistic model in particular may be used to extrapolate to determine 'optimum' conditions outside the ranges studied to date; note that experimental data can be one of the least reliable inputs to a model, for a host of practical reasons; unreliability of experimental data (e.g. lack of mole or mass balance) may only be noticed if the model has a mechanistic basis;
5. criticality studies, to determine the proximity to edge of failure for limiting factors; these can leverage a mechanistic model if one exists; otherwise will require further experiments to extrapolate or mimic likely failure modes;
6. factor space exploration; this may be a very broad, full factorial, exploration with a mechanistic model, or a narrower exploration using a further set of DOE experiments; in either case, model uncertainty and/or experimental error are taken into account; with the mechanistic model only, we can add formulas for derived responses that were not or cannot easily be measured (e.g. pass time, fail time); an important feature of a mechanistic model is that one set of model parameters fits all responses, not one set per response.
7. design space definition; for a limited set of factors, this defines the relationship among their ranges that produces product of acceptable quality; until recently, overlapping response surfaces for each CQA was considered adequate; a more reliable approach is to calculate the probability of success across the factor space, leading to a direct estimate of the associated risk of failure and a narrower design space; here the relative merits of Bayesian and frequentist statistics may become relevant;
8. confirmatory experiments that operating within the design space provides the required level of assurance of quality;
9. with a mechanistic model only: demonstrate to colleagues, management, regulators, manufacturing and quality control that a high level of process understanding has been achieved, otherwise the mechanistic model would not fit the data; justify the scale-independence of the design space; demonstrate the impact of scale-up on the CQA by predicting performance in large scale equipment.

The models developed above may be leveraged pre- and post-NDA in many other ways, including to guide process development, achieve yield or other business objectives, facilitate technology transfer and be used at-line. Mechanistic models in particular also offer new ways to define design space to maximize flexibility and be tolerant to minor process upsets.

Keen Bayesian statisticians reading the above will notice that a high degree of prior knowledge is used to develop these guidelines and to carry out the associated experimental and mechanistic modeling work; in that sense there is something very Bayesian about how mechanistic models are developed.

In the mechanistic approach, modeling takes place alongside experiments and new information leads to refinements in the model. The probability that the model is valid is thereby continually refined upwards as new data are included, following Bayes' theorem.

New data also add degrees of freedom to the model, leading to ultimately sharper definition of probability distributions for model responses, important for design space definition.

Wednesday, February 11, 2009

Confidence and prediction bands: the shape of things to come

Confidence bands and prediction intervals (or bands) are useful in quantifying the amount of uncertainty associated with model predictions. Quantifying uncertainty has grown in importance with the adoption of a risk based approach consistent with ICH Q8 and Q9, providing a basis for estimating the probability of successful operation at a given set of conditions and thereby defining a design space.

Confidence bands for linear models such as those used in statistics software packages are often ‘u-shaped’ like those shown in Figure 1. Users who are familiar with these plots may expect similarly shaped confidence bands for other types of model. In fact, u-shaped confidence bands are more the exception rather than the rule, as discussed in the knowledge base article available here (login required).

Prediction intervals (or bands) are wider than confidence bands and tend to run more parallel with average responses.

U-shaped confidence bands (indicated in Figure 1 by the blue curves around the best fit line) are observed when fitting to a linear model (y=mx+c) and when the intercept is non-zero (i.e. fitting both the slope and the intercept). In these cases, confidence bands are narrowest at the average value of x (e.g. see reference 1) and expand on either side of this value.

When a linear model of form y=mx is fitted instead, confidence bands are no longer u-shaped, but run as straight lines diverging from the best fit line as x increases, as in Figure 2.



Mechanistic models of most interest for design space and QbD work are initial value problems, where the initial values of responses are known, the independent variable is time and the rates of change of those responses are calculated from ordinary differential equations.

The general procedure for obtaining asymptotic confidence bands for such a non-linear mechanistic model follows the same steps as the two linear cases above: calculation of the gradients of responses with respect to the fitted parameter values and matrix multiplication of these gradients with the covariance matrix of the fitted parameters.

The qualitative behaviour of the confidence bands can be deduced at certain limits without any calculations:
  • The initial values for integrated responses are not sensitive to the parameter estimates and therefore confidence bands for these have zero width at time zero, like the case of a linear model with no intercept.
  • The values for some integrated responses will become constant at long times, e.g. towards the end of a simulation when all rates of change have dropped to zero. These final values will again not be sensitive at those times to the parameter estimates and therefore the confidence bands will again have zero width.

Figure 3 shows one example of a response from a non-linear mechanistic model of this type, for product formation in a system of competing chemical reactions. This has typical confidence band behaviour for such a profile (confidence band width plotted in green, values on the right hand y-axis):

- zero confidence band width at the start
- maximum confidence band width when the product level is changing rapidly
- almost zero confidence band width at the end, when the reaction is nearly over.


References
1. Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building, George E. P. Box, William G. Hunter, J. Stuart Hunter , John Wiley & Sons, 1978

ShareThis small