forked from OHDSI/TheBookOfOhdsi
-
Notifications
You must be signed in to change notification settings - Fork 1
/
SuggestedAnswers.Rmd
838 lines (604 loc) · 34.7 KB
/
SuggestedAnswers.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
# Suggested Answers {#SuggestedAnswers}
This Appendix contains suggested answers for the exercises in the book.
## The Common Data Model {#Cdmanswers}
#### Exercise \@ref(exr:exerciseJohnPerson) {-}
Based on the description in the exercise, John's record should look like Table \@ref(tab:johnPerson).
Table: (\#tab:johnPerson) The PERSON table.
Column name|Value|Explanation
:---------------------|:-----------|:--------------------------------------
|PERSON_ID|2|A unique integer.|
|GENDER_CONCEPT_ID|8507|The concept ID for male gender is [8507](http://athena.ohdsi.org/search-terms/terms/8507).|
|YEAR_OF_BIRTH|1974||
|MONTH_OF_BIRTH|8||
|DAY_OF_BIRTH|4||
|BIRTH_DATETIME|1974-08-04 00:00:00|When the time is not known midnight is used.|
|DEATH_DATETIME|NULL||
|RACE_CONCEPT_ID|8516|The concept ID for black or African American is [8516](http://athena.ohdsi.org/search-terms/terms/8516).|
|ETHNICITY_ CONCEPT_ID|38003564| [38003564](http://athena.ohdsi.org/search-terms/terms/38003564) refers to "Not hispanic".|
|LOCATION_ID||His address is not known.|
|PROVIDER_ID||His primary care Provider is not known.|
|CARE_SITE||His primary Care Site is not known.|
|PERSON_SOURCE_ VALUE|NULL|Not provided.|
|GENDER_SOURCE_ VALUE|Man|The text used in the description.|
|GENDER_SOURCE_ CONCEPT_ID|0||
|RACE_SOURCE_ VALUE|African American|The text used in the description.|
|RACE_SOURCE_ CONCEPT_ID|0||
|ETHNICITY_SOURCE_ VALUE|NULL||
|ETHNICITY_SOURCE_ CONCEPT_ID|0||
#### Exercise \@ref(exr:exerciseJohnOp) {-}
Based on the description in the exercise, John's record should look like Table \@ref(tab:johnOp).
Table: (\#tab:johnOp) The OBSERVATION_PERIOD table.
Column name|Value|Explanation
:----------------------|:----------|:--------------------------------------
|OBSERVATION_ PERIOD_ID|2|A unique integer.|
|PERSON_ID|2|This is a foreign key to John's record in the PERSON table.|
|OBSERVATION_PERIOD_ START_DATE|2015-01-01|The date of enrollment.|
|OBSERVATION_PERIOD_ END_DATE|2019-07-01|No data can be expected after the data extraction date.|
|PERIOD_TYPE_ CONCEPT_ID|44814722| [44814724](http://athena.ohdsi.org/search-terms/terms/44814722) refers to "Period while enrolled in insurance".|
#### Exercise \@ref(exr:exerciseJohnDrug) {-}
Based on the description in the exercise, John's record should look like Table \@ref(tab:johnDrug).
Table: (\#tab:johnDrug) The DRUG_EXPOSURE table.
Column name|Value|Explanation
:---------------------|:-----------|:--------------------------------------
|DRUG_EXPOSURE_ID|1001|Some unique integer|
|PERSON_ID|2|This is a foreign key to John's record in the PERSON table.|
|DRUG_CONCEPT_ID|19078461|The provided NDC code maps to Standard Concept [19078461](http://athena.ohdsi.org/search-terms/terms/19078461).|
|DRUG_EXPOSURE_ START_DATE|2019-05-01|The start date of the exposure to the drug.|
|DRUG_EXPOSURE_ START_DATETIME|2019-05-01 00:00:00|Midnight is used as the time is not known.|
|DRUG_EXPOSURE_ END_DATE|2019-05-31|Based on start date + days supply.|
|DRUG_EXPOSURE_ END_DATETIME|2019-05-31 00:00:00|Midnight is used as time is unknown.|
|VERBATIM_END_DATE|NULL|Not provided.|
|DRUG_TYPE_ CONCEPT_ID|38000177| [38000177](http://athena.ohdsi.org/search-terms/terms/38000177) indicates "Prescription written".|
|STOP_REASON|NULL||
|REFILLS|NULL||
|QUANTITY|NULL|Not provided.|
|DAYS_SUPPLY|30|As described in the exercise.|
|SIG|NULL|Not provided.|
|ROUTE_CONCEPT_ID|4132161| [4132161](http://athena.ohdsi.org/search-terms/terms/4132161) indicates "Oral".|
|LOT_NUMBER|NULL|Not provided.|
|PROVIDER_ID|NULL|Not provided.|
|VISIT_OCCURRENCE_ ID|NULL|No information on the visit was provided..|
|VISIT_DETAIL_ID|NULL||
|DRUG_SOURCE_ VALUE|76168009520|This is provided NDC code.|
|DRUG_SOURCE_ CONCEPT_ID|583945| [583945](http://athena.ohdsi.org/search-terms/terms/750264) represents the drug source value (NDC code "76168009520").|
|ROUTE_SOURCE_ VALUE|NULL||
#### Exercise \@ref(exr:exerciseGiBleedRecords) {-}
To find the set of records, we can query the CONDITION_OCCURRENCE table:
```{r eval=FALSE}
library(DatabaseConnector)
connection <- connect(connectionDetails)
sql <- "SELECT *
FROM @cdm.condition_occurrence
WHERE condition_concept_id = 192671;"
result <- renderTranslateQuerySql(connection, sql, cdm = "main")
head(result)
```
```{r echo=FALSE,message=FALSE,eval=TRUE}
cat(" CONDITION_OCCURRENCE_ID PERSON_ID CONDITION_CONCEPT_ID ...
1 4657 273 192671 ...
2 1021 61 192671 ...
3 5978 351 192671 ...
4 9798 579 192671 ...
5 9301 549 192671 ...
6 1997 116 192671 ...
")
```
#### Exercise \@ref(exr:exercisePersonSource) {-}
To find the set of records, we can query the CONDITION_OCCURRENCE table using the CONDITION_SOURCE_VALUE field:
```{r eval=FALSE}
sql <- "SELECT *
FROM @cdm.condition_occurrence
WHERE condition_source_value = 'K92.2';"
result <- renderTranslateQuerySql(connection, sql, cdm = "main")
head(result)
```
```{r echo=FALSE,message=FALSE,eval=TRUE}
cat(" CONDITION_OCCURRENCE_ID PERSON_ID CONDITION_CONCEPT_ID ...
1 4657 273 192671 ...
2 1021 61 192671 ...
3 5978 351 192671 ...
4 9798 579 192671 ...
5 9301 549 192671 ...
6 1997 116 192671 ...
")
```
#### Exercise \@ref(exr:exercisePerson61Records) {-}
This information is stored in the OBSERVATION_PERIOD table:
```{r eval=FALSE}
library(DatabaseConnector)
connection <- connect(connectionDetails)
sql <- "SELECT *
FROM @cdm.observation_period
WHERE person_id = 61;"
renderTranslateQuerySql(connection, sql, cdm = "main")
```
```{r echo=FALSE,message=FALSE,eval=TRUE}
cat(" OBSERVATION_PERIOD_ID PERSON_ID OBSERVATION_PERIOD_START_DATE ...
1 61 61 1968-01-21 ...
")
```
## Standardized Vocabularies {#Vocabanswers}
#### Exercise \@ref(exr:exerciseVocab1) {-}
Concept ID 192671 ("Gastrointestinal hemorrhage")
#### Exercise \@ref(exr:exerciseVocab2) {-}
ICD-10CM codes:
- K29.91 "Gastroduodenitis, unspecified, with bleeding"
- K92.2 "Gastrointestinal hemorrhage, unspecified"
ICD-9CM codes:
- 578 "Gastrointestinal hemorrhage"
- 578.9 "Hemorrhage of gastrointestinal tract, unspecified"
#### Exercise \@ref(exr:exerciseVocab3) {-}
MedDRA preferred terms:
- "Gastrointestinal haemorrhage" (Concept ID 35707864)
- "Intestinal haemorrhage" (Concept ID 35707858)
## Extract Transform Load {#Etlanswers}
#### Exercise \@ref(exr:exerciseEtl1) {-}
A) Data experts and CDM experts together design the ETL
C) People with medical knowledge create the code mappings
B) A technical person implements the ETL
D) All are involved in quality control
#### Exercise \@ref(exr:exerciseEtl2) {-}
Column | Value | Answer
:---------------- |:----------- |:-----------------------
PERSON_ID | A123B456 | This column has a data type of integer so the source record value needs to be translated to a numeric value.
GENDER_CONCEPT_ID | 8532 |
YEAR_OF_BIRTH | NULL | If we do not know the month or day of birth, we do not guess. A person can exist without a month or day of birth. If a person lacks a birth year that person should be dropped. This person would have to be dropped due to now year of birth.
MONTH_OF_BIRTH | NULL |
DAY_OF_BIRTH | NULL |
RACE_CONCEPT_ID | 0 | The race is WHITE which should be mapped to 8527.
ETHNICITY_CONCEPT_ ID | 8527 | No ethnicity was provided, this should be mapped to 0.
PERSON_SOURCE_ VALUE | A123B456 |
GENDER_SOURCE_ VALUE | F |
RACE_SOURCE_VALUE | WHITE |
ETHNICITY_SOURCE_ VALUE | NONE PROVIDED |
#### Exercise \@ref(exr:exerciseEtl3) {-}
Column | Value
:------------------------ |:-------------
VISIT_OCCURRENCE_ID | 1
PERSON_ID | 11
VISIT_START_DATE | 2004-09-26
VISIT_END_DATE | 2004-09-30
VISIT_CONCEPT_ID | 9201
VISIT_SOURCE_VALUE | inpatient
## Data Analytics Use Cases {#UseCasesanswers}
#### Exercise \@ref(exr:exerciseUseCases1) {-}
1. Characterization
2. Patient-level prediction
3. Population-level estimation
#### Exercise \@ref(exr:exerciseUseCases2) {-}
Probably not. Defining a non-exposure cohort that is comparable to your diclofenac exposure cohort is often impossible, since people take diclofenac for a reason. This precludes a between-person comparison. It might possible to a within-person comparison, so for each patient in the diclofenac cohort identifying time when they are not exposed, but a similar problem occurs here: these times are likely incomparable, because there are reasons when at one time someone is exposed and at other times not.
## SQL and R {#SqlAndRanswers}
#### Exercise \@ref(exr:exercisePeopleCount) {-}
To compute the number of people we can simply query the PERSON table:
```{r eval=FALSE}
library(DatabaseConnector)
connection <- connect(connectionDetails)
sql <- "SELECT COUNT(*) AS person_count
FROM @cdm.person;"
renderTranslateQuerySql(connection, sql, cdm = "main")
```
```{r echo=FALSE,message=FALSE,eval=TRUE}
cat(" PERSON_COUNT
1 2694
")
```
#### Exercise \@ref(exr:exerciseCelecoxibUsers) {-}
To compute the number of people with at least one prescription of celecoxib, we can query the DRUG_EXPOSURE table. To find all drugs containing the ingredient celecoxib, we join to the CONCEPT_ANCESTOR and CONCEPT tables:
```{r eval=FALSE}
library(DatabaseConnector)
connection <- connect(connectionDetails)
sql <- "SELECT COUNT(DISTINCT(person_id)) AS person_count
FROM @cdm.drug_exposure
INNER JOIN @cdm.concept_ancestor
ON drug_concept_id = descendant_concept_id
INNER JOIN @cdm.concept ingredient
ON ancestor_concept_id = ingredient.concept_id
WHERE LOWER(ingredient.concept_name) = 'celecoxib'
AND ingredient.concept_class_id = 'Ingredient'
AND ingredient.standard_concept = 'S';"
renderTranslateQuerySql(connection, sql, cdm = "main")
```
```{r echo=FALSE,message=FALSE,eval=TRUE}
cat(" PERSON_COUNT
1 1844
")
```
Note that we use `COUNT(DISTINCT(person_id))` to find the number of distinct persons, considering that a person might have more than one prescription. Also note that we use the `LOWER` function to make our search for "celecoxib" case-insensitive.
Alternatively, we can use the DRUG_ERA table, which is already rolled up to the ingredient level:
```{r eval=FALSE}
library(DatabaseConnector)
connection <- connect(connectionDetails)
sql <- "SELECT COUNT(DISTINCT(person_id)) AS person_count
FROM @cdm.drug_era
INNER JOIN @cdm.concept ingredient
ON drug_concept_id = ingredient.concept_id
WHERE LOWER(ingredient.concept_name) = 'celecoxib'
AND ingredient.concept_class_id = 'Ingredient'
AND ingredient.standard_concept = 'S';"
renderTranslateQuerySql(connection, sql, cdm = "main")
```
```{r echo=FALSE,message=FALSE,eval=TRUE}
cat(" PERSON_COUNT
1 1844
")
```
#### Exercise \@ref(exr:exerciseGiBleedsDuringCelecoxib) {-}
To compute the number of diagnoses during exposure we extend our previous query by joining to the CONDITION_OCCURRENCE table. We join to the CONCEPT_ANCESTOR table to find all condition concepts that imply a gastrointestinal haemorrhage:
```{r eval=FALSE}
library(DatabaseConnector)
connection <- connect(connectionDetails)
sql <- "SELECT COUNT(*) AS diagnose_count
FROM @cdm.drug_era
INNER JOIN @cdm.concept ingredient
ON drug_concept_id = ingredient.concept_id
INNER JOIN @cdm.condition_occurrence
ON condition_start_date >= drug_era_start_date
AND condition_start_date <= drug_era_end_date
INNER JOIN @cdm.concept_ancestor
ON condition_concept_id =descendant_concept_id
WHERE LOWER(ingredient.concept_name) = 'celecoxib'
AND ingredient.concept_class_id = 'Ingredient'
AND ingredient.standard_concept = 'S'
AND ancestor_concept_id = 192671;"
renderTranslateQuerySql(connection, sql, cdm = "main")
```
```{r echo=FALSE,message=FALSE,eval=TRUE}
cat(" DIAGNOSE_COUNT
1 41
")
```
Note that in this case it is essential to use the DRUG_ERA table instead of the DRUG_EXPOSURE table, because drug exposures with the same ingredient can overlap, but drug eras can not. This could lead to double counting. For example, imagine a person received two drug drugs containing celecoxib at the same time. This would be recorded as two drug exposures, so any diagnoses occurring during the exposure would be counted twice. The two exposures will be merged into a single non-overlapping drug era.
## Defining Cohorts {#Cohortsanswers}
#### Exercise \@ref(exr:exerciseCohortsAtlas) {-}
We create initial event criteria encoding these requirements:
- New users of diclofenac
- Ages 16 or older
- With at least 365 days of continuous observation prior to exposure
When done, the cohort entry event section should look like Figure \@ref(fig:cohortsAtlasInitialEvents).
```{r cohortsAtlasInitialEvents, fig.cap='Cohort entry event settings for new users of diclofenac',echo=FALSE, out.width='100%', fig.align='center'}
knitr::include_graphics("images/SuggestedAnswers/cohortsAtlasInitialEvents.png")
```
The concept set expression for diclofenac should look like Figure \@ref(fig:cohortsAtlasConceptSet1), including the ingredient 'Diclofenac' and all of its descendant, thus including all drugs containing the ingredient diclofenac.
```{r cohortsAtlasConceptSet1, fig.cap='Concept set expression for diclofenac.',echo=FALSE, out.width='100%', fig.align='center'}
knitr::include_graphics("images/SuggestedAnswers/cohortsAtlasConceptSet1.png")
```
Next, we require no prior exposure to any NSAID, as shown in Figure \@ref(fig:cohortsAtlasInclusion1).
```{r cohortsAtlasInclusion1, fig.cap='Requiring no prior exposure to any NSAID.',echo=FALSE, out.width='100%', fig.align='center'}
knitr::include_graphics("images/SuggestedAnswers/cohortsAtlasInclusion1.png")
```
The concept set expression for NSAIDs should look like Figure \@ref(fig:cohortsAtlasConceptSet2), including the NSAIDs class and all of its descendant, thus including all drugs containing any NSAID.
```{r cohortsAtlasConceptSet2, fig.cap='Concept set expression for NSAIDs',echo=FALSE, out.width='100%', fig.align='center'}
knitr::include_graphics("images/SuggestedAnswers/cohortsAtlasConceptSet2.png")
```
Additionally, we require no prior diagnosis of cancer, as shown in Figure \@ref(fig:cohortsAtlasInclusion2).
```{r cohortsAtlasInclusion2, fig.cap='Requiring no prior cancer diagnosis.',echo=FALSE, out.width='100%', fig.align='center'}
knitr::include_graphics("images/SuggestedAnswers/cohortsAtlasInclusion2.png")
```
The concept set expression for "Broad malignancies" should look like Figure \@ref(fig:cohortsAtlasConceptSet3), including the high level concept "Malignant neoplastic disease" and all of its descendant.
```{r cohortsAtlasConceptSet3, fig.cap='Concept set expression for broad malignancies',echo=FALSE, out.width='100%', fig.align='center'}
knitr::include_graphics("images/SuggestedAnswers/cohortsAtlasConceptSet3.png")
```
Finally, we define the cohort exit criteria as discontinuation of exposure (allowing for a 30-day gap), as shown in Figure \@ref(fig:cohortsAtlasExit).
```{r cohortsAtlasExit, fig.cap='Setting the cohort exit date.',echo=FALSE, out.width='100%', fig.align='center'}
knitr::include_graphics("images/SuggestedAnswers/cohortsAtlasExit.png")
```
#### Exercise \@ref(exr:exerciseCohortsSql) {-}
For readability we here split the SQL into two steps. We first find all condition occurrences of myocardial infarction, and store these in a temp table called "#diagnoses":
```{r eval=FALSE}
library(DatabaseConnector)
connection <- connect(connectionDetails)
sql <- "SELECT person_id AS subject_id,
condition_start_date AS cohort_start_date
INTO #diagnoses
FROM @cdm.condition_occurrence
WHERE condition_concept_id IN (
SELECT descendant_concept_id
FROM @cdm.concept_ancestor
WHERE ancestor_concept_id = 4329847 -- Myocardial infarction
)
AND condition_concept_id NOT IN (
SELECT descendant_concept_id
FROM @cdm.concept_ancestor
WHERE ancestor_concept_id = 314666 -- Old myocardial infarction
);"
renderTranslateExecuteSql(connection, sql, cdm = "main")
```
We then select only those that occur during an inpatient or ER visit, using some unique COHORT_DEFINITION_ID (we selected '1'):
```{r eval=FALSE}
sql <- "INSERT INTO @cdm.cohort (
subject_id,
cohort_start_date,
cohort_definition_id
)
SELECT subject_id,
cohort_start_date,
CAST (1 AS INT) AS cohort_definition_id
FROM #diagnoses
INNER JOIN @cdm.visit_occurrence
ON subject_id = person_id
AND cohort_start_date >= visit_start_date
AND cohort_start_date <= visit_end_date
WHERE visit_concept_id IN (9201, 9203, 262); -- Inpatient or ER;"
renderTranslateExecuteSql(connection, sql, cdm = "main")
```
Note that an alternative approach would have been to join the conditions to the visits based on the VISIT_OCCURRENCE_ID, instead of requiring the condition date to fall within the visit start and end date. This would likely be more accurate, as it would guarantee that the condition was recorded in relation to the inpatient or ER visit. However, many observational databases do not record the link between visit and diagnose, and we therefore chose to use the dates instead, likely giving us a higher sensitivity but perhaps lower specificity.
Note also that we ignored the cohort end date. Often, when a cohort is used to define an outcome we are only interested in the cohort start date, and there is no point in creating an (ill-defined) cohort end date.
It is recommended to clean up any temp tables when no longer needed:
```{r eval=FALSE}
sql <- "TRUNCATE TABLE #diagnoses;
DROP TABLE #diagnoses;"
renderTranslateExecuteSql(connection, sql)
```
## Characterization {#Characterizationanswers}
#### Exercise \@ref(exr:exerciseCharacterization1) {-}
In ATLAS we click on ![](images/Characterization/atlasDataSourcesMenuItem.png) and select the data source we're interested in. We could select the Drug Exposure report, select the "Table" tab, and search for "celecoxib" as shown in Figure \@ref(fig:dataSourcesAtlas). Here we see that this particular database has exposures to various formulations of celecoxib. We could click on any of these drugs to get a more detailed view, for example showing age and gender distributions for these drugs.
```{r dataSourcesAtlas, fig.cap='Data source characterization.',echo=FALSE, out.width='100%', fig.align='center'}
knitr::include_graphics("images/SuggestedAnswers/dataSourcesAtlas.png")
```
#### Exercise \@ref(exr:exerciseCharacterization2) {-}
Click on ![](images/Cohorts/cohortdefinition.png) and then "New cohort" to create a new cohort. Give the cohort a meaningful name (e.g. "Celecoxib new users") and go to the "Concept Sets" tab. Click on "New Concept Set", and give your concept set a meaningful names (e.g. "Celecoxib"). Open the ![](images/Cohorts/search-2.png) module, search for "celecoxib", restrict the Class to "Ingredient" and Standard Concept to "Standard", and click the ![](images/Cohorts/shoppingcart.png) to add the concept to your concept set as show in Figure \@ref(fig:conceptSearchAtlas).
```{r conceptSearchAtlas, fig.cap='Selecting the standard concept for the ingredient "celecoxib".',echo=FALSE, out.width='100%', fig.align='center'}
knitr::include_graphics("images/SuggestedAnswers/conceptSearchAtlas.png")
```
Click on the left arrow shown at the top left of Figure \@ref(fig:conceptSearchAtlas) to return to your cohort definition. Click on "+Add Initial Event" and then "Add Drug Era". Select your previously created concept set for the drug era criterion. Click on "Add attribute..." and select "Add First Exposure Criteria." Set the required continuous observation to at least 365 days before the index date. The result should look like Figure \@ref(fig:celecoxibCohortDefinition). Leave the Inclusion Criteria, Cohort Exit, and Cohort Eras section as they are. Make sure to save the cohort definition by clicking ![](images/Cohorts/save.png), and close it by clicking ![](images/SuggestedAnswers/close.png).
```{r celecoxibCohortDefinition, fig.cap='A simple celecoxib new user cohort definition.',echo=FALSE, out.width='100%', fig.align='center'}
knitr::include_graphics("images/SuggestedAnswers/celecoxibCohortDefinition.png")
```
Now that we have our cohort defined, we can characterize it. Click on ![](images/Characterization/atlasCharacterizationMenuItem.png) and then "New Characterization". Give you characterization a meaningful name (e.g. "Celecoxib new users characterization"). Under Cohort Definitions, click on "Import" and select your recently created cohort definition. Under "Feature Analyses", click on "Import" and select at least one condition analysis and one drug analysis, for example "Drug Group Era Any Time Prior" and "Condition Group Era Any Time Prior". Your characterization definition should now look like Figure \@ref(fig:celecoxibCharacterization). Make sure to save the characterization settings by clicking ![](images/Cohorts/save.png).
```{r celecoxibCharacterization, fig.cap='Characterization settings.',echo=FALSE, out.width='100%', fig.align='center'}
knitr::include_graphics("images/SuggestedAnswers/celecoxibCharacterization.png")
```
Click on the "Executions" tab, and click on "Generate" for one of the data sources. It may take a while for the generation to complete. When done, we can click on "View latest results". The resulting screen will look something like Figure \@ref(fig:celecoxibCharacterizationResults), showing for example that pain and arthropathy are commonly observed, which should not surprise use as these are indications for celecoxib. Lower on the list we may see conditions we were not expecting.
```{r celecoxibCharacterizationResults, fig.cap='Characterization settings.',echo=FALSE, out.width='100%', fig.align='center'}
knitr::include_graphics("images/SuggestedAnswers/celecoxibCharacterizationResults.png")
```
#### Exercise \@ref(exr:exerciseCharacterization3) {-}
Click on ![](images/Cohorts/cohortdefinition.png) and then "New cohort" to create a new cohort. Give the cohort a meaningful name (e.g. "GI bleed") and go to the "Concept Sets" tab. Click on "New Concept Set", and give your concept set a meaningful names (e.g. "GI bleed"). Open the ![](images/Cohorts/search-2.png) module, search for "Gastrointestinal hemorrhage", and click the ![](images/Cohorts/shoppingcart.png) next to the top concept to add the concept to your concept set as show in Figure \@ref(fig:giBleedSearch).
```{r giBleedSearch, fig.cap='Selecting the standard concept for "Gastrointestinal hemorrhage".',echo=FALSE, out.width='100%', fig.align='center'}
knitr::include_graphics("images/SuggestedAnswers/giBleedSearch.png")
```
Click on the left arrow shown at the top left of Figure \@ref(fig:giBleedSearch) to return to your cohort definition. Open the "Concept Sets" tab again, and check "Descendants" next to the GI hemorrhage concept, as shown in Figure \@ref(fig:giBleedDescendants).
```{r giBleedDescendants, fig.cap='Adding all descendants to "Gastrointestinal hemorrhage".',echo=FALSE, out.width='100%', fig.align='center'}
knitr::include_graphics("images/SuggestedAnswers/giBleedDescendants.png")
```
Return to the "Definition" tab, click on "+Add Initial Event" and then "Add Condition Occurrence". Select your previously created concept set for the condition occurrence criterion. The result should look like Figure \@ref(fig:giBleedCohortDefinition). Leave the Inclusion Criteria, Cohort Exit, and Cohort Eras section as they are. Make sure to save the cohort definition by clicking ![](images/Cohorts/save.png), and close it by clicking ![](images/SuggestedAnswers/close.png).
```{r giBleedCohortDefinition, fig.cap='A simple gastrointestinal bleed cohort definition.',echo=FALSE, out.width='100%', fig.align='center'}
knitr::include_graphics("images/SuggestedAnswers/giBleedCohortDefinition.png")
```
Now that we have our cohort defined, we can compute the incidence rate. Click on ![](images/Characterization/atlasIncidenceMenuItem.png) and then "New Analysis". Give your analysis a meaningful name (e.g. "Incidence of GI bleed after celecoxib initiation"). Click "Add Target Cohort" and select our celecoxib new user cohort. Click on "Add Outcome Cohort" and add our new GI bleed cohort. Set the Time At Risk to end 1095 days after the start date. The analysis should now look like Figure \@ref(fig:irAnalysis). Make sure to save the analysis settings by clicking ![](images/Cohorts/save.png).
```{r irAnalysis, fig.cap='A incidence rate analysis.',echo=FALSE, out.width='100%', fig.align='center'}
knitr::include_graphics("images/SuggestedAnswers/irAnalysis.png")
```
Click on the "Generation" tab, and click on "Generate". Select one of the data sources and click "Generate". When done, we can see the computed incidence rate and proportion, as shown in Figure \@ref(fig:irResults).
```{r irResults, fig.cap='Incidence results.',echo=FALSE, out.width='100%', fig.align='center'}
knitr::include_graphics("images/SuggestedAnswers/irResults.png")
```
## Population-Level Estimation {#Pleanswers}
#### Exercise \@ref(exr:exercisePle1) {-}
We specify the default set of covariates, but we must exclude the two drugs we're comparing, including all their descendants, because else our propensity model will become perfectly predictive:
```{r eval=FALSE}
library(CohortMethod)
nsaids <- c(1118084, 1124300) # celecoxib, diclofenac
covSettings <- createDefaultCovariateSettings(
excludedCovariateConceptIds = nsaids,
addDescendantsToExclude = TRUE)
# Load data:
cmData <- getDbCohortMethodData(
connectionDetails = connectionDetails,
cdmDatabaseSchema = "main",
targetId = 1,
comparatorId = 2,
outcomeIds = 3,
exposureDatabaseSchema = "main",
exposureTable = "cohort",
outcomeDatabaseSchema = "main",
outcomeTable = "cohort",
covariateSettings = covSettings)
summary(cmData)
```
```{r echo=FALSE,message=FALSE,eval=TRUE}
cat("CohortMethodData object summary
Treatment concept ID: 1
Comparator concept ID: 2
Outcome concept ID(s): 3
Treated persons: 1800
Comparator persons: 830
Outcome counts:
Event count Person count
3 479 479
Covariates:
Number of covariates: 389
Number of non-zero covariate values: 26923
")
```
#### Exercise \@ref(exr:exercisePle2) {-}
We create the study population following the specifications, and output the attrition diagram:
```{r eval=FALSE}
studyPop <- createStudyPopulation(
cohortMethodData = cmData,
outcomeId = 3,
washoutPeriod = 180,
removeDuplicateSubjects = "remove all",
removeSubjectsWithPriorOutcome = TRUE,
riskWindowStart = 0,
startAnchor = "cohort start",
riskWindowEnd = 99999)
drawAttritionDiagram(studyPop)
```
```{r echo=FALSE, out.width='80%', fig.align='center'}
knitr::include_graphics("images/SuggestedAnswers/attrition.png")
```
We see that we did not lose any subjects compared to the original cohorts, probably because the restrictions used here were already applied in the cohort definitions.
#### Exercise \@ref(exr:exercisePle3) {-}
We fit a simple outcome model using a Cox regression:
```{r eval=FALSE}
model <- fitOutcomeModel(population = studyPop,
modelType = "cox")
model
```
```{r echo=FALSE,message=FALSE,eval=TRUE}
cat("Model type: cox
Stratified: FALSE
Use covariates: FALSE
Use inverse probability of treatment weighting: FALSE
Status: OK
Estimate lower .95 upper .95 logRr seLogRr
treatment 1.34612 1.10065 1.65741 0.29723 0.1044
")
```
It is likely that celecoxib users are not exchangeable with diclofenac users, and that these baseline differences already lead to different risks of the outcome. If we do not adjust for these difference, like in this analysis, we are likely producing biased estimates.
#### Exercise \@ref(exr:exercisePle4) {-}
We fit a propensity model on our study population, using all covariates we extracted. We then show the preference score distribution:
```{r eval=FALSE}
ps <- createPs(cohortMethodData = cmData,
population = studyPop)
plotPs(ps, showCountsLabel = TRUE, showAucLabel = TRUE)
```
```{r echo=FALSE, out.width='80%', fig.align='center'}
knitr::include_graphics("images/SuggestedAnswers/ps.png")
```
Note that this distribution looks a bit odd, with several spikes. This is because we are using a very small simulated dataset. Real preference score distributions tend to be much smoother.
The propensity model achieves an AUC of 0.63, suggested there are differences between target and comparator cohort. We see quite a lot overlap between the two groups suggesting PS adjustment can make them more comparable.
#### Exercise \@ref(exr:exercisePle5) {-}
We stratify the population based on the propensity scores, and compute the covariate balance before and after stratification:
```{r eval=FALSE}
strataPop <- stratifyByPs(ps, numberOfStrata = 5)
bal <- computeCovariateBalance(strataPop, cmData)
plotCovariateBalanceScatterPlot(bal,
showCovariateCountLabel = TRUE,
showMaxLabel = TRUE,
beforeLabel = "Before stratification",
afterLabel = "After stratification")
```
```{r echo=FALSE, out.width='70%', fig.align='center'}
knitr::include_graphics("images/SuggestedAnswers/scatter.png")
```
We see that various baseline covariates showed a large (>0.3) standardized difference of means before stratification (x-axis). After stratification, balance is increased, with the maximum standardized difference <= 0.1.
#### Exercise \@ref(exr:exercisePle6) {-}
We fit a outcome model using a Cox regression, but stratify it by the PS strata:
```{r eval=FALSE}
adjModel <- fitOutcomeModel(population = strataPop,
modelType = "cox",
stratified = TRUE)
adjModel
```
```{r echo=FALSE,message=FALSE,eval=TRUE}
cat("Model type: cox
Stratified: TRUE
Use covariates: FALSE
Use inverse probability of treatment weighting: FALSE
Status: OK
Estimate lower .95 upper .95 logRr seLogRr
treatment 1.13211 0.92132 1.40008 0.12409 0.1068
")
```
We see the adjusted estimate is lower than the unadjusted estimate, and that the 95% confidence interval now includes 1. This is because we are now adjusting for baseline differences between the two exposure groups, thus reducing bias.
## Patient-Level Prediction {#Plpanswers}
#### Exercise \@ref(exr:exercisePlp1) {-}
We specify a set of covariate settings, and use the `getPlpData` function to extract the data from the database:
```{r eval=FALSE}
library(PatientLevelPrediction)
covSettings <- createCovariateSettings(
useDemographicsGender = TRUE,
useDemographicsAge = TRUE,
useConditionGroupEraLongTerm = TRUE,
useConditionGroupEraAnyTimePrior = TRUE,
useDrugGroupEraLongTerm = TRUE,
useDrugGroupEraAnyTimePrior = TRUE,
useVisitConceptCountLongTerm = TRUE,
longTermStartDays = -365,
endDays = -1)
plpData <- getPlpData(connectionDetails = connectionDetails,
cdmDatabaseSchema = "main",
cohortDatabaseSchema = "main",
cohortTable = "cohort",
cohortId = 4,
covariateSettings = covSettings,
outcomeDatabaseSchema = "main",
outcomeTable = "cohort",
outcomeIds = 3)
summary(plpData)
```
```{r echo=FALSE,message=FALSE,eval=TRUE}
cat("plpData object summary
At risk cohort concept ID: -1
Outcome concept ID(s): 3
People: 2630
Outcome counts:
Event count Person count
3 479 479
Covariates:
Number of covariates: 245
Number of non-zero covariate values: 54079
")
```
#### Exercise \@ref(exr:exercisePlp2) {-}
We create a study population for the outcome of interest (in this case the only outcome for which we extracted data), removing subjects who experienced the outcome before they started the NSAID, and requiring 364 days of time-at-risk:
```{r eval=FALSE}
population <- createStudyPopulation(plpData = plpData,
outcomeId = 3,
washoutPeriod = 364,
firstExposureOnly = FALSE,
removeSubjectsWithPriorOutcome = TRUE,
priorOutcomeLookback = 9999,
riskWindowStart = 1,
riskWindowEnd = 365,
addExposureDaysToStart = FALSE,
addExposureDaysToEnd = FALSE,
minTimeAtRisk = 364,
requireTimeAtRisk = TRUE,
includeAllOutcomes = TRUE)
nrow(population)
```
```{r echo=FALSE,message=FALSE,eval=TRUE}
cat("[1] 2578")
```
In this case we have lost a few people by removing those that had the outcome prior, and by requiring a time-at-risk of at least 364 days.
#### Exercise \@ref(exr:exercisePlp3) {-}
We run a LASSO model by first creating a model settings object, and then calling the `runPlp` function. In this case we do a person split, training the model on 75% of the data and evaluating on 25% of the data:
```{r eval=FALSE}
lassoModel <- setLassoLogisticRegression(seed = 0)
lassoResults <- runPlp(population = population,
plpData = plpData,
modelSettings = lassoModel,
testSplit = 'person',
testFraction = 0.25,
nfold = 2,
splitSeed = 0)
```
Note that for this example set the random seeds both for the LASSO cross-validation and for the train-test split to make sure the results will be the same on multiple runs.
We can now view the results using the Shiny app:
```{r eval=FALSE}
viewPlp(lassoResults)
```
This will launch the app as shown in Figure \@ref(fig:plpShiny). Here we see an AUC on the test set of 0.645, which is better than random guessing, but maybe not good enough for clinical practice.
```{r plpShiny, fig.cap='Patient-level prediction Shiny app.',echo=FALSE, out.width='100%', fig.align='center'}
knitr::include_graphics("images/SuggestedAnswers/plpShiny.png")
```
## Data Quality {#DataQualityanswers}
#### Exercise \@ref(exr:exerciseRunAchilles) {-}
To run ACHILLES:
```{r eval=FALSE}
library(ACHILLES)
result <- achilles(connectionDetails,
cdmDatabaseSchema = "main",
resultsDatabaseSchema = "main",
sourceName = "Eunomia",
cdmVersion = "5.3.0")
```
#### Exercise \@ref(exr:exerciseRunDQD) {-}
To run the Data Quality Dashboard:
```{r eval=FALSE}
DataQualityDashboard::executeDqChecks(
connectionDetails,
cdmDatabaseSchema = "main",
resultsDatabaseSchema = "main",
cdmSourceName = "Eunomia",
outputFolder = "C:/dataQualityExample")
```
#### Exercise \@ref(exr:exerciseViewDQD) {-}
To view the list of data quality checks:
```{r eval=FALSE}
DataQualityDashboard::viewDqDashboard(
"C:/dataQualityExample/Eunomia/results_Eunomia.json")
```