Thrift module: event

ModuleServicesData typesConstants
event AdInfo
AltIds
AssetPerformanceData
AssignedId
AttentionTime
Component
Event
LazyComponents
NavigationType
Page
PageView
PageViewAdditional
Podcast
PodcastPlatform
SuspectStatus
Tag
WebPerformanceData

Enumerations

Enumeration: SuspectStatus


VALID0
This event is valid and should be processed normally

INVALID_APP_RESUME_BUG1
This event is invalid, because we think it was only sent because
of a persistent bug in the mobile apps where they erroneously
sent page view notification after a device suspend-resume

INVALID_WEB_NOT_SUCCESSFUL3
This event does not represent a usual page view, as the user was returned
a non-200 success code. (Which we track by putting special ophan tracking on
the error page displayed.)

Unless you're specifically interested in errors, you should ignore these as
page views.


INVALID_ROBOT4
This event was generated by robotic requests.

Unless you're specifically interested in understanding javascript-enabled
robotic traffic, you should ignore these as page views.


INVALID_THIRD_PARTY_EMBED5
This event was generated by display of a guardian media asset embedded on a
third party site.

Unless you're specifically interested in understanding third party embed,
you should ignore these as page views.


INVALID_INTERNAL_GUARDIAN_TRAFFIC6
This event appears to have been generated by a user inside the Guardian network.

Unless you're specifically interested in understanding the behaviour of users
inside the Guardian networks, you should ignore these as page views.


INVALID_BLOCKED_URL7
The event goes to an url that has been blocked because we suspect it is the persistent target of bots, which
skews the analytics


Enumeration: NavigationType

https://developer.mozilla.org/en-US/docs/Web/API/PerformanceNavigationTiming/type#value



NAVIGATE1
RELOAD2
BACK_FORWARD3
PRERENDER4

Data structures

Struct: AssignedId

KeyFieldTypeDescriptionRequirednessDefault value
1idstringThe actual id required
2isNewboolWhether the id was generated and set for the first time on this request. If undefined, it's not known whether the id is a new one or not. optional

Ophan assigns various ids to things when necessary
(on Web, by dropping cookies). This struct indicates
the id and
whether we freshly assigned an id on this request.

Struct: Tag

KeyFieldTypeDescriptionRequirednessDefault value
1idstringrequired
2typestringrequired
3sectionIdstringoptional
4sectionNamestringoptional
5webTitlestringoptional
6webUrlstringoptional

Struct: PodcastPlatform

KeyFieldTypeDescriptionRequirednessDefault value
1namestringrequired
2versionstringoptional
3isBrowserbooloptional

Struct: Podcast

KeyFieldTypeDescriptionRequirednessDefault value
1episodeIdstringrequired
2podcastIdstringrequired
3platformPodcastPlatformoptional

Struct: Page

KeyFieldTypeDescriptionRequirednessDefault value
1urlurl.UrlUrl of the page served required
4campaignCodesset<string>The values of all of the CMP and INTCMP parameters optional
5platformplatform.PlatformThe platform that served this page. Marked as optional because when new platforms are added and your thrift definition hasn't been updated, this will become empty. * optional
6sectionstringThe section id of the page. Network front pages (e.g. "/uk") are assigned to a section named "/" (a single slash). If we can find the page in CAPI, the section value comes from CAPI, otherwise it is currently extracted from the url. In the future, this value will only come from CAPI. Will always be undefined for non www.theguardian.com urls. optional
7publicationDatestringFor most content pages, the publication date of the content, in ISO date format "YYYY-MM-DD" e.g. 2014-01-20 Note this is currently extracted from the CAPI web publication date, though falls back to the path url Fronts will never have a publication date. optional
8contentTypesset<string>Returns some approximation of the content type, generated by the serving platform. Currently, next-gen produces a maximum of one entry whereas R2 generates a number of varying entries. Native mobile apps produce nothing. There is little consistency between the strings that the platforms report. As a result, you are strongly advised not to make use of this field unless you have a really good reason to do so. It will likely be removed at some point in the future. optional
9renderedComponentsset<string>The set of component names that were rendered on this page. Currently only reported by next-gen, this can be used to validate the effectiveness of particular components. While it is possible to send this information on the initial pageview, because they are loaded asynchronously, it is more effective to send this information in the follow-up WEB_ADDITIONAL submission, and so this is replicated out into the top level there. optional
10tagsset<Tag>The set of tags that were rendered on this page. This field is being tracked from June 2016. optional
11podcastPodcastAdditional information for pageviews that refer to a podcast optional
12experiencesset<string>Experience contains information about the rendering of the page. It may contain multiple comma-seperated experiences. optional
13productionOfficestringThe production office that produced the page optional
14internalPageCodestringThe Guardian Path Manager Id (see https://github.com/guardian/path-manager), also formerly known as the 'R2' id. Uniquely identifies a content page, even if it's URL is updated. This field was initially added to Ophan's model as a stable id for 'Evolving URLs' work, but it later transpired that the Content API (CAPI) Id was preferred: https://github.com/guardian/ophan/pull/3779#issuecomment-687167191 optional
17capiIdstringThe Guardian Content API (CAPI) Id. Uniquely identifies Content, even if its URL is updated. Changing a content's url is generally avoided, but may be done for SEO reasons (eg Coronavirus explainer). Having this stable identifier should make it possible to see a unified Ophan view of an article's performance, even as it changes URLs. See also 'Evolving URLs': https://docs.google.com/document/d/1s6xsGHcQOgdPBTbGYwqTXuCz90e3lHKOnkBFW6yCvqY/edit# optional
15isContentboolTrue if the Guardian Content API considers this to be 'content' These are all content: interactives, liveblogs, gallery pages, podcast pages, video pages, crosswords These are not: - fronts (including the UK network front, but also section fronts e.g. the uk business front https://www.theguardian.com/uk/business) - tag pages (eg https://www.theguardian.com/tv-and-radio/doctor-who) optional
16utmParametersutmparameters.UtmParametersIf included, contains a set of UTM parameters included in the query string of the pageview. https://en.wikipedia.org/wiki/UTM_parameters Separate from query string params so that we can query based on UTM params, and separate from Google referral because UTM params are not exclusively used by Google. optional

Details about the page that was served to the user


Struct: PageView

KeyFieldTypeDescriptionRequirednessDefault value
1validitySuspectStatusWhether we view this event as "suspect". most consumers of this stream should ignore events that are do not have a status of VALID. optionalSuspectStatus.VALID
2pagePageDetails about the page displayed required
3userAgentua.UserAgentThe user agent that made this request optional
4locationgeo.GeoLocationDetails about the location of the user made this request optional
7ipAddressgeo.IpAddressIP details of the user who made this request optional
5referrerreferrer.ReferrerDetails about the referrer to this page view Will only be present on _PAGE_VIEW events where a referer was received optional
6httpStatusi16The http status returned to the user for this page view required200
8daysVisitedInLastWeeki32The number of days in the previous week that the device has had a page view recorded on the Guardian optional
9totalDaysVisitedi32Total number of days on which this browser has visited the guardian, ever. (Or, at least, since the slab's records began, which is since June 2015.) Includes "today" so this number should never be less than 1. This field is accurately populated starting from 2016-06-09 optional
10averageDaysBetweenRecentVisitsdoubleCalculated average days between recent visits. Note, this is actually calculated in a way to make discount older visits: min(56, daysSinceFirstVisit + 1) / daysVisitedInLast56days This field is accurately populated starting from 2016-06-09 optional
11regularboolIs this browser a regular visitor to the Guardian? i.e. totalDaysVisited >= 8 && averageDaysBetweenRecentVisits <= 7 >optional
12subscriptionTypesubscription.SubscriptionTypeoptional
13membershipTiersubscription.MembershipTieroptional
14navigationTypeNavigationTypeThe navigation metadata type associated with this event. optional
15frequencyBucketi16The frequency bucket associated with this event. optional

Details about a page view - only populated for _PAGE_VIEW
event types


Struct: AttentionTime

KeyFieldTypeDescriptionRequirednessDefault value
1attentionMsi64Attention time spent on this page view (indicated by event.pageViewId) in milliseconds. required
2componentAttentionmap<string, i64>Map of component name to time in ms when attention time rules are true and the component is visible optional

Struct: AdInfo

KeyFieldTypeDescriptionRequirednessDefault value
1adBlockerDetectedboolIf present, indicates that we have detected the presence or absence of ad blocking technology. If absent, this check was not performed. If we have previously sent an event with this field present, you should treat that value as still valid. (Currently, the check is only performed on initial page load, so this field will only ever be set in the event relating to the initial page view not on subsequent events relating to that page view.) optional
2adslist<ads.RenderedAd>Details of one or more ads rendered. Currently, each ad rendering is sent to ophan in a separate event so there will only ever be one entry in this list. This is likely to change in the future, however. optional
3OBSOLETE_customSegmentslist<string>This is a list of custom segments assigned to this user. Typically they will come from krux. Populated in the slab via dynamodb. optional

Struct: AssetPerformanceData

KeyFieldTypeDescriptionRequirednessDefault value
1namestringThe name of the metric being captured required
2timingi64Value of window.performance.now at which metric was captured required

Asset performance data, e.g. JavaScript


Struct: WebPerformanceData

KeyFieldTypeDescriptionRequirednessDefault value
1dnsi64Time in ms that dns lookup took. Calculated by t.domainLookupEnd - t.domainLookupStart required
2connectioni64Time in ms that connection to the server took. Calculated by t.connectEnd - t.connectStart required
3firstBytei64Time to first byte Calculated by t.responseStart - t.connectEnd required
4lastBytei64First byte to last byte, or closed, including if from cache. Calculated by t.responseEnd - t.responseStart required
5domContentLoadedEventi64From last byte of doc to start of domContentLoaded Calculated by t.domContentLoadedEventStart - t.responseEnd required
6loadEventi64 * documentLoaded to start of load event * Calculated by t.loadEventStart - t.domContentLoadedEventStart * required
7navTypei64The navigation type Value of window.performance.navigation.type required
8redirectCounti64Number of redirects on current domain. Value of window.performance.navigation.redirectCount required
9assetsPerformancelist<AssetPerformanceData>List of key-value pairs representing custom Asset performance data, e.g. time at which JavaScript begins / finishes executing optional

Web performance data as captured from the browser performance timing api.
In the descriptions below, "t" represents window.performance.timing


Struct: Component

KeyFieldTypeDescriptionRequirednessDefault value
1namestringThe name of this component required
2loadTimeMsi64How long, in milliseconds, that the component took to load. If absent, the component had not completed loading by the time the snapshot of lazy components was taken. optional

Struct: LazyComponents

KeyFieldTypeDescriptionRequirednessDefault value
1componentsset<Component>The set of components that were loaded lazily on this page. Note: statically loaded components are reported in Page.components required

Struct: AltIds

KeyFieldTypeDescriptionRequirednessDefault value
1OBSOLETE_OMNITURE_s_vistringOBSOLETE: The Guardian stopped using Omniture analytics in 2016. The Omniture user identifier, stored in the s_vi cookie. This was only ever available on Web events, not Native app events. optional
2OBSOLETE_OMNITURE_s_sessstringOBSOLETE: The Guardian stopped using Omniture analytics in 2016. The Omniture session identifier, stored in the s_sess cookie. This was only ever available on Web events, not Native app events. optional
3OBSOLETE_kruxIdstringDEPRECATED - Used to be for the user's Krux identifier. optional
4OBSOLETE_exactTargetSubscriberIdstringDEPRECATED - Exact target is no longer used. The Exact Target email subscriber id. Page Views in response to readers clicking on an email sent by Exact Target include the exact target subscriber id. This field is populated in that case. optional
5ampViewIdstring

Some form of hopefully-unique id for pageviews made on the Google AMP ('Accelerated Mobile Pages') platform (eg https://amp.theguardian.com/us-news/commentisfree/2016/feb/16/thomas-piketty-bernie-sanders-us-election-2016). The AMP platform gets the definition of ampViewId, along with a lot of other stuff, from https://ophan.theguardian.com/amp.json.

We've historically had a hard time getting a unique pageview identifier out of the AMP platform- the PAGE_VIEW_ID field supplied by the AMP Platform was not as unique as we might have thought, and so the ampViewId value sent to Ophan's Tracker was an amalgam of as many differing fields, including PAGE_VIEW_ID, that we could get our hands on. However, in October 2020, Ophan PR #3955 took advantage of the newly-introduced PAGE_VIEW_ID_64 field (see also https://github.com/ampproject/amphtml/issues/12674 and the AMP Project design-review comment: "Low entropy PAGE_VIEW_ID was to avoid any privacy implication. But we think it's fine to introduce a high entropy one now.") to finally get a properly unique page view id.

The original purpose for this field when it was added in 2019 was to allow Data & Insight to tie together page views with the ad view data from 'Google Ad Manager' (GAM - formerly known as 'DoubleClick for Publishers', or DFP). When this was reviewed in October 2020, it was posited that this had never happened, as there was no evidence found in source-control of a defined data-processing job performing that link (it's not entirely clear whether this search may have missed ad-hoc use by D&I), and the primary interest in the field became joining with consent data in the Sourcepoint consent management system.

optional

Struct: PageViewAdditional

KeyFieldTypeDescriptionRequirednessDefault value
1renderedComponentsset<string>The set of component names that were rendered on this page. Currently only reported by next-gen, this can be used to validate the effectiveness of particular components. optional
2experiencesset<string>Experience contains information about the rendering of the page. It may contain multiple comma-seperated experiences. optional

Struct: Event

KeyFieldTypeDescriptionRequirednessDefault value
2uniqueEventIdstringGlobally unique id associated with this event. Ophan never makes better than at-least-once delivery promises, so you must ensure that processing two events with the same uniqueEventId has no effect required
3dti64The date time (in millis since epoch UTC) at which this event occurred. required
10receivedDti64The date time (in millis since epoc UTC) at which this event was received by ophan for processing. For web generated events, this is the same as dt. For native mobile app generated events, it might not be. required
4pageViewIdstringThe page view for which this event is associated. Ophan may send multiple events relating to the same page view, which may contain updates to any previously supplied data or new data. You should treat the one with the highest timestamp (dt) as the most accurate. required
5browserIdAssignedIdThe unique id associated with this browser. Currently this is maintained by setting a cookie for web events, or otherwise determined for native apps. optional
6visitIdAssignedIdDEPRECATED - The unique id associated with this "visit". For web reports, the visit id is a refreshed session cookie that expires after 30 minutes of activity. Mobile apps do not currently set this value. optional
7userIdstringIf the user is logged in, the identity user id. * optional
17altIdsAltIdsVarious other identifiers we may have to identify this user. optional
8pageViewPageViewIf populated, this event represents a page view. optional
9attentionAttentionTimeIf populated, this event includes attention time data. Note this will also be populated, typically with a value of zero, alongside a pageView value for page views generated by platforms that support attention time tracking. optional
11adsAdInfoIf populated, this event includes advertising-related information optional
12perfWebPerformanceDataIf populated, this event includes web performance load information optional
13mediamedia.MediaPlaybackIf populated, this event includes data about media playback optional
14ababtest.AbTestInfoIf populated, this event includes data about ab tests that the user was a member of optional
15lazyComponentsLazyComponentsIf populated, this event includes data about components that were lazily loaded. optional
16quizEventquiz.QuizEventIf populated, this event includes data about a quiz event. optional
18inPageClickinpageclick.InPageClickIf populated, this event includes data about a click that did not result in a page transition optional
20interactioninteraction.Interactionoptional
21inPrivateBrowsingModeboolIs this browser in private browsing mode? i.e. incognito mode in Google Chrome = true optional
22ipConnectivityipv6.IpConnectivityIf populated, includes data about IPv6 connectivity optional
23acquisitionacquisition.AcquisitionAcquisition of one of our current products, eg Contribution, Membership, Recurring Contribution optional
24componentEventcomponentevent.ComponentEventAn event relating to a component e.g. a user clicking on the contribution CTA of an Epic component e.g. an Atom component being inserted into a page optional
25additionalPageViewAdditionalafter an initial pageview is recorded, we can send a follow-up WEB_ADDITIONAL submission, containing other information that is collected later in the pageview's lifetime, such as asynchronously loaded components. That information will be stored here. optional
26sessionIdstringIdentity user session identifier. We can use this value to identify the session a logged in user was using to view the page. Helps us keep track of how often users are logging in and out. optional
27editionnativeapp.EditionIf populated, this is the selected edition recorded at the time of this event. 'TrackerSubmission.IndividualSubmission.NativeAppSubmission.App.Edition' optional
28consentconsent.ConsentAfter an initial pageview is recorded, we can send follow-up CONSENT submissions to capture the reader's consent. The reader can change the consent on the same pageview. optional