Module | Services | Data types | Constants |
---|---|---|---|
event | AdInfo AltIds AssetPerformanceData AssignedId AttentionTime Component Event LazyComponents NavigationType Page PageView PageViewAdditional Podcast PodcastPlatform SuspectStatus Tag WebPerformanceData |
VALID | 0 |
This event is valid and should be processed normally |
INVALID_APP_RESUME_BUG | 1 |
This event is invalid, because we think it was only sent because of a persistent bug in the mobile apps where they erroneously sent page view notification after a device suspend-resume |
INVALID_WEB_NOT_SUCCESSFUL | 3 |
This event does not represent a usual page view, as the user was returned a non-200 success code. (Which we track by putting special ophan tracking on the error page displayed.) Unless you're specifically interested in errors, you should ignore these as page views. |
INVALID_ROBOT | 4 |
This event was generated by robotic requests. Unless you're specifically interested in understanding javascript-enabled robotic traffic, you should ignore these as page views. |
INVALID_THIRD_PARTY_EMBED | 5 |
This event was generated by display of a guardian media asset embedded on a third party site. Unless you're specifically interested in understanding third party embed, you should ignore these as page views. |
INVALID_INTERNAL_GUARDIAN_TRAFFIC | 6 |
This event appears to have been generated by a user inside the Guardian network. Unless you're specifically interested in understanding the behaviour of users inside the Guardian networks, you should ignore these as page views. |
INVALID_BLOCKED_URL | 7 |
The event goes to an url that has been blocked because we suspect it is the persistent target of bots, which skews the analytics |
https://developer.mozilla.org/en-US/docs/Web/API/PerformanceNavigationTiming/type#value
NAVIGATE | 1 | |
RELOAD | 2 | |
BACK_FORWARD | 3 | |
PRERENDER | 4 |
Key | Field | Type | Description | Requiredness | Default value |
---|---|---|---|---|---|
1 | id | string | The actual id | required | |
2 | isNew | bool | Whether the id was generated and set for the first time on this request. If undefined, it's not known whether the id is a new one or not. | optional |
Ophan assigns various ids to things when necessary (on Web, by dropping cookies). This struct indicates the id and whether we freshly assigned an id on this request.
Key | Field | Type | Description | Requiredness | Default value |
---|---|---|---|---|---|
1 | id | string | required | ||
2 | type | string | required | ||
3 | sectionId | string | optional | ||
4 | sectionName | string | optional | ||
5 | webTitle | string | optional | ||
6 | webUrl | string | optional |
Key | Field | Type | Description | Requiredness | Default value |
---|---|---|---|---|---|
1 | name | string | required | ||
2 | version | string | optional | ||
3 | isBrowser | bool | optional |
Key | Field | Type | Description | Requiredness | Default value |
---|---|---|---|---|---|
1 | episodeId | string | required | ||
2 | podcastId | string | required | ||
3 | platform | PodcastPlatform | optional |
Key | Field | Type | Description | Requiredness | Default value |
---|---|---|---|---|---|
1 | url | url.Url | Url of the page served | required | |
4 | campaignCodes | set< | The values of all of the CMP and INTCMP parameters | optional | |
5 | platform | platform.Platform | The platform that served this page. Marked as optional because when new platforms are added and your thrift definition hasn't been updated, this will become empty. * | optional | |
6 | section | string | The section id of the page. Network front pages (e.g. "/uk") are assigned to a section named "/" (a single slash). If we can find the page in CAPI, the section value comes from CAPI, otherwise it is currently extracted from the url. In the future, this value will only come from CAPI. Will always be undefined for non www.theguardian.com urls. | optional | |
7 | publicationDate | string | For most content pages, the publication date of the content, in ISO date format "YYYY-MM-DD" e.g. 2014-01-20 Note this is currently extracted from the CAPI web publication date, though falls back to the path url Fronts will never have a publication date. | optional | |
8 | contentTypes | set< | Returns some approximation of the content type, generated by the serving platform. Currently, next-gen produces a maximum of one entry whereas R2 generates a number of varying entries. Native mobile apps produce nothing. There is little consistency between the strings that the platforms report. As a result, you are strongly advised not to make use of this field unless you have a really good reason to do so. It will likely be removed at some point in the future. | optional | |
9 | renderedComponents | set< | The set of component names that were rendered on this page. Currently only reported by next-gen, this can be used to validate the effectiveness of particular components. While it is possible to send this information on the initial pageview, because they are loaded asynchronously, it is more effective to send this information in the follow-up WEB_ADDITIONAL submission, and so this is replicated out into the top level there. | optional | |
10 | tags | set< | The set of tags that were rendered on this page. This field is being tracked from June 2016. | optional | |
11 | podcast | Podcast | Additional information for pageviews that refer to a podcast | optional | |
12 | experiences | set< | Experience contains information about the rendering of the page. It may contain multiple comma-seperated experiences. | optional | |
13 | productionOffice | string | The production office that produced the page | optional | |
14 | internalPageCode | string | The Guardian Path Manager Id (see https://github.com/guardian/path-manager), also formerly known as the 'R2' id. Uniquely identifies a content page, even if it's URL is updated. This field was initially added to Ophan's model as a stable id for 'Evolving URLs' work, but it later transpired that the Content API (CAPI) Id was preferred: https://github.com/guardian/ophan/pull/3779#issuecomment-687167191 | optional | |
17 | capiId | string | The Guardian Content API (CAPI) Id. Uniquely identifies Content, even if its URL is updated. Changing a content's url is generally avoided, but may be done for SEO reasons (eg Coronavirus explainer). Having this stable identifier should make it possible to see a unified Ophan view of an article's performance, even as it changes URLs. See also 'Evolving URLs': https://docs.google.com/document/d/1s6xsGHcQOgdPBTbGYwqTXuCz90e3lHKOnkBFW6yCvqY/edit# | optional | |
15 | isContent | bool | True if the Guardian Content API considers this to be 'content' These are all content: interactives, liveblogs, gallery pages, podcast pages, video pages, crosswords These are not: - fronts (including the UK network front, but also section fronts e.g. the uk business front https://www.theguardian.com/uk/business) - tag pages (eg https://www.theguardian.com/tv-and-radio/doctor-who) | optional | |
16 | utmParameters | utmparameters.UtmParameters | If included, contains a set of UTM parameters included in the query string of the pageview. https://en.wikipedia.org/wiki/UTM_parameters Separate from query string params so that we can query based on UTM params, and separate from Google referral because UTM params are not exclusively used by Google. | optional |
Details about the page that was served to the user
Key | Field | Type | Description | Requiredness | Default value |
---|---|---|---|---|---|
1 | validity | SuspectStatus | Whether we view this event as "suspect". most consumers of this stream should ignore events that are do not have a status of VALID. | optional |
|
2 | page | Page | Details about the page displayed | required | |
3 | userAgent | ua.UserAgent | The user agent that made this request | optional | |
4 | location | geo.GeoLocation | Details about the location of the user made this request | optional | |
7 | ipAddress | geo.IpAddress | IP details of the user who made this request | optional | |
5 | referrer | referrer.Referrer | Details about the referrer to this page view Will only be present on _PAGE_VIEW events where a referer was received | optional | |
6 | httpStatus | i16 | The http status returned to the user for this page view | required | 200 |
8 | daysVisitedInLastWeek | i32 | The number of days in the previous week that the device has had a page view recorded on the Guardian | optional | |
9 | totalDaysVisited | i32 | Total number of days on which this browser has visited the guardian, ever. (Or, at least, since the slab's records began, which is since June 2015.) Includes "today" so this number should never be less than 1. This field is accurately populated starting from 2016-06-09 | optional | |
10 | averageDaysBetweenRecentVisits | double | Calculated average days between recent visits. Note, this is actually calculated in a way to make discount older visits: min(56, daysSinceFirstVisit + 1) / daysVisitedInLast56days This field is accurately populated starting from 2016-06-09 | optional | |
11 | regular | bool | Is this browser a regular visitor to the Guardian? i.e. totalDaysVisited >= 8 && averageDaysBetweenRecentVisits <= 7 > | optional | |
12 | subscriptionType | subscription.SubscriptionType | optional | ||
13 | membershipTier | subscription.MembershipTier | optional | ||
14 | navigationType | NavigationType | The navigation metadata type associated with this event. | optional | |
15 | frequencyBucket | i16 | The frequency bucket associated with this event. | optional |
Details about a page view - only populated for _PAGE_VIEW event types
Key | Field | Type | Description | Requiredness | Default value |
---|---|---|---|---|---|
1 | attentionMs | i64 | Attention time spent on this page view (indicated by event.pageViewId) in milliseconds. | required | |
2 | componentAttention | map< | Map of component name to time in ms when attention time rules are true and the component is visible | optional |
Key | Field | Type | Description | Requiredness | Default value |
---|---|---|---|---|---|
1 | adBlockerDetected | bool | If present, indicates that we have detected the presence or absence of ad blocking technology. If absent, this check was not performed. If we have previously sent an event with this field present, you should treat that value as still valid. (Currently, the check is only performed on initial page load, so this field will only ever be set in the event relating to the initial page view not on subsequent events relating to that page view.) | optional | |
2 | ads | list< | Details of one or more ads rendered. Currently, each ad rendering is sent to ophan in a separate event so there will only ever be one entry in this list. This is likely to change in the future, however. | optional | |
3 | OBSOLETE_customSegments | list< | This is a list of custom segments assigned to this user. Typically they will come from krux. Populated in the slab via dynamodb. | optional |
Key | Field | Type | Description | Requiredness | Default value |
---|---|---|---|---|---|
1 | name | string | The name of the metric being captured | required | |
2 | timing | i64 | Value of window.performance.now at which metric was captured | required |
Asset performance data, e.g. JavaScript
Key | Field | Type | Description | Requiredness | Default value |
---|---|---|---|---|---|
1 | dns | i64 | Time in ms that dns lookup took. Calculated by t.domainLookupEnd - t.domainLookupStart | required | |
2 | connection | i64 | Time in ms that connection to the server took. Calculated by t.connectEnd - t.connectStart | required | |
3 | firstByte | i64 | Time to first byte Calculated by t.responseStart - t.connectEnd | required | |
4 | lastByte | i64 | First byte to last byte, or closed, including if from cache. Calculated by t.responseEnd - t.responseStart | required | |
5 | domContentLoadedEvent | i64 | From last byte of doc to start of domContentLoaded Calculated by t.domContentLoadedEventStart - t.responseEnd | required | |
6 | loadEvent | i64 | * documentLoaded to start of load event * Calculated by t.loadEventStart - t.domContentLoadedEventStart * | required | |
7 | navType | i64 | The navigation type Value of window.performance.navigation.type | required | |
8 | redirectCount | i64 | Number of redirects on current domain. Value of window.performance.navigation.redirectCount | required | |
9 | assetsPerformance | list< | List of key-value pairs representing custom Asset performance data, e.g. time at which JavaScript begins / finishes executing | optional |
Web performance data as captured from the browser performance timing api. In the descriptions below, "t" represents window.performance.timing
Key | Field | Type | Description | Requiredness | Default value |
---|---|---|---|---|---|
1 | name | string | The name of this component | required | |
2 | loadTimeMs | i64 | How long, in milliseconds, that the component took to load. If absent, the component had not completed loading by the time the snapshot of lazy components was taken. | optional |
Key | Field | Type | Description | Requiredness | Default value |
---|---|---|---|---|---|
1 | components | set< | The set of components that were loaded lazily on this page. Note: statically loaded components are reported in Page.components | required |
Key | Field | Type | Description | Requiredness | Default value |
---|---|---|---|---|---|
1 | OBSOLETE_OMNITURE_s_vi | string | OBSOLETE: The Guardian stopped using Omniture analytics in 2016. The Omniture user identifier, stored in the s_vi cookie. This was only ever available on Web events, not Native app events. | optional | |
2 | OBSOLETE_OMNITURE_s_sess | string | OBSOLETE: The Guardian stopped using Omniture analytics in 2016. The Omniture session identifier, stored in the s_sess cookie. This was only ever available on Web events, not Native app events. | optional | |
3 | OBSOLETE_kruxId | string | DEPRECATED - Used to be for the user's Krux identifier. | optional | |
4 | OBSOLETE_exactTargetSubscriberId | string | DEPRECATED - Exact target is no longer used. The Exact Target email subscriber id. Page Views in response to readers clicking on an email sent by Exact Target include the exact target subscriber id. This field is populated in that case. | optional | |
5 | ampViewId | string | Some form of hopefully-unique id for pageviews made on the Google AMP ('Accelerated Mobile Pages')
platform (eg https://amp.theguardian.com/us-news/commentisfree/2016/feb/16/thomas-piketty-bernie-sanders-us-election-2016).
The AMP platform gets the definition of We've historically had a hard time getting a unique pageview identifier out of the AMP platform- the
The original purpose for this field when it was added in 2019 was to allow Data & Insight to tie together page views with the ad view data from 'Google Ad Manager' (GAM - formerly known as 'DoubleClick for Publishers', or DFP). When this was reviewed in October 2020, it was posited that this had never happened, as there was no evidence found in source-control of a defined data-processing job performing that link (it's not entirely clear whether this search may have missed ad-hoc use by D&I), and the primary interest in the field became joining with consent data in the Sourcepoint consent management system. | optional |
Key | Field | Type | Description | Requiredness | Default value |
---|---|---|---|---|---|
1 | renderedComponents | set< | The set of component names that were rendered on this page. Currently only reported by next-gen, this can be used to validate the effectiveness of particular components. | optional | |
2 | experiences | set< | Experience contains information about the rendering of the page. It may contain multiple comma-seperated experiences. | optional |
Key | Field | Type | Description | Requiredness | Default value |
---|---|---|---|---|---|
2 | uniqueEventId | string | Globally unique id associated with this event. Ophan never makes better than at-least-once delivery promises, so you must ensure that processing two events with the same uniqueEventId has no effect | required | |
3 | dt | i64 | The date time (in millis since epoch UTC) at which this event occurred. | required | |
10 | receivedDt | i64 | The date time (in millis since epoc UTC) at which this event was received by ophan for processing. For web generated events, this is the same as dt. For native mobile app generated events, it might not be. | required | |
4 | pageViewId | string | The page view for which this event is associated. Ophan may send multiple events relating to the same page view, which may contain updates to any previously supplied data or new data. You should treat the one with the highest timestamp (dt) as the most accurate. | required | |
5 | browserId | AssignedId | The unique id associated with this browser. Currently this is maintained by setting a cookie for web events, or otherwise determined for native apps. | optional | |
6 | visitId | AssignedId | DEPRECATED - The unique id associated with this "visit". For web reports, the visit id is a refreshed session cookie that expires after 30 minutes of activity. Mobile apps do not currently set this value. | optional | |
7 | userId | string | If the user is logged in, the identity user id. * | optional | |
17 | altIds | AltIds | Various other identifiers we may have to identify this user. | optional | |
8 | pageView | PageView | If populated, this event represents a page view. | optional | |
9 | attention | AttentionTime | If populated, this event includes attention time data. Note this will also be populated, typically with a value of zero, alongside a pageView value for page views generated by platforms that support attention time tracking. | optional | |
11 | ads | AdInfo | If populated, this event includes advertising-related information | optional | |
12 | perf | WebPerformanceData | If populated, this event includes web performance load information | optional | |
13 | media | media.MediaPlayback | If populated, this event includes data about media playback | optional | |
14 | ab | abtest.AbTestInfo | If populated, this event includes data about ab tests that the user was a member of | optional | |
15 | lazyComponents | LazyComponents | If populated, this event includes data about components that were lazily loaded. | optional | |
16 | quizEvent | quiz.QuizEvent | If populated, this event includes data about a quiz event. | optional | |
18 | inPageClick | inpageclick.InPageClick | If populated, this event includes data about a click that did not result in a page transition | optional | |
20 | interaction | interaction.Interaction | optional | ||
21 | inPrivateBrowsingMode | bool | Is this browser in private browsing mode? i.e. incognito mode in Google Chrome = true | optional | |
22 | ipConnectivity | ipv6.IpConnectivity | If populated, includes data about IPv6 connectivity | optional | |
23 | acquisition | acquisition.Acquisition | Acquisition of one of our current products, eg Contribution, Membership, Recurring Contribution | optional | |
24 | componentEvent | componentevent.ComponentEvent | An event relating to a component e.g. a user clicking on the contribution CTA of an Epic component e.g. an Atom component being inserted into a page | optional | |
25 | additional | PageViewAdditional | after an initial pageview is recorded, we can send a follow-up WEB_ADDITIONAL submission, containing other information that is collected later in the pageview's lifetime, such as asynchronously loaded components. That information will be stored here. | optional | |
26 | sessionId | string | Identity user session identifier. We can use this value to identify the session a logged in user was using to view the page. Helps us keep track of how often users are logging in and out. | optional | |
27 | edition | nativeapp.Edition | If populated, this is the selected edition recorded at the time of this event. 'TrackerSubmission.IndividualSubmission.NativeAppSubmission.App.Edition' | optional | |
28 | consent | consent.Consent | After an initial pageview is recorded, we can send follow-up CONSENT submissions to capture the reader's consent. The reader can change the consent on the same pageview. | optional |