API documentation

This documentation is based on the source code of version 4.0.2 of the chat-archive package. The following modules are available:

chat_archive

Python API for the chat-archive program.

chat_archive.DEFAULT_ACCOUNT_NAME = 'default'

The name of the default account (a string).

class chat_archive.ChatArchive(*args, **kw)[source]

Python API for the chat-archive program.

You can set the values of the data_directory, database_file and force properties by passing keyword arguments to the class initializer.

Here’s an overview of the ChatArchive class:

Superclass: SchemaManager
Public methods: commit_changes(), get_accounts_for_backend(), get_accounts_from_config(), get_accounts_from_database(), get_backend_name(), get_backends_and_accounts(), initialize_backend(), is_operator(), load_backend_module(), parse_account_expression(), search_messages() and synchronize()
Properties: alembic_directory, backends, config, config_loader, data_directory, database_file, declarative_base, force, import_stats, num_contacts, num_conversations, num_html_messages, num_messages and operator_name
alembic_directory

The pathname of the directory containing Alembic migration scripts (a string).

The value of this property is computed at runtime based on the value of __file__ inside of the chat_archive/__init__.py module.

backends[source]

A dictionary of available backends (names and dotted paths).

>>> from chat_archive import ChatArchive
>>> archive = ChatArchive()
>>> print(archive.backends)
{'gtalk': 'chat_archive.backends.gtalk',
 'hangouts': 'chat_archive.backends.hangouts',
 'slack': 'chat_archive.backends.slack',
 'telegram': 'chat_archive.backends.telegram'}

Note

The backends property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

config[source]

A dictionary with general user defined configuration options.

Note

The config property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

config_loader[source]

A ConfigLoader object that provides access to the configuration.

Configuration files are text files in the subset of ini syntax supported by Python’s configparser module. They can be located in the following places:

Directory Main configuration file Modular configuration files
/etc /etc/chat-archive.ini /etc/chat-archive.d/*.ini
~ ~/.chat-archive.ini ~/.chat-archive.d/*.ini
~/.config ~/.config/chat-archive.ini ~/.config/chat-archive.d/*.ini

The available configuration files are loaded in the order given above, so that user specific configuration files override system wide configuration files.

Note

The config_loader property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

declarative_base

The base class for declarative models defined using SQLAlchemy.

data_directory[source]

The pathname of the directory where data files are stored (a string).

The environment variable $CHAT_ARCHIVE_DIRECTORY can be used to set the value of this property. When the environment variable isn’t set the default value ~/.local/share/chat-archive is used (where ~ is expanded to the profile directory of the current user).

Note

The data_directory property is a custom_property. You can change the value of this property using normal attribute assignment syntax. This property’s value is computed once (the first time it is accessed) and the result is cached. To clear the cached value you can use del or delattr().

database_file[source]

The absolute pathname of the SQLite database file (a string).

This defaults to ~/.local/share/chat-archive/database.sqlite3 (with ~ expanded to the home directory of the current user) based on data_directory.

Note

The database_file property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

force[source]

Retry synchronization of conversations where errors were previously encountered (a boolean, defaults to False).

Note

The force property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

import_stats[source]

Statistics about objects imported by backends (a BackendStats object).

Note

The import_stats property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

num_contacts

The total number of chat contacts in the local archive (a number).

num_conversations

The total number of chat conversations in the local archive (a number).

num_html_messages

The total number of chat messages with HTML formatting in the local archive (a number).

num_messages

The total number of chat messages in the local archive (a number).

operator_name[source]

The full name of the person using the chat-archive program (a string or None).

The value of operator_name is used to address the operator of the chat-archive program in first person instead of third person. You can change the value in the configuration file:

[chat-archive]
operator-name = ...

The default value in case none has been specified in the configuration file is taken from /etc/passwd using get_full_name().

Note

The operator_name property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

commit_changes()[source]

Show import statistics when committing database changes to disk.

get_accounts_for_backend(backend_name)[source]

Select the configured and/or previously synchronized account names for the given backend.

get_accounts_from_database(backend_name)[source]

Get the names of the accounts that are already in the database for the given backend.

get_accounts_from_config(backend_name)[source]

Get the names of the accounts configured for the given backend in the configuration file.

get_backend_name(backend_name)[source]

Get a human friendly name for the given backend.

get_backends_and_accounts(*backends)[source]

Select backends and accounts to synchronize.

initialize_backend(backend_name, account_name)[source]

Load a chat archive backend module.

Parameters:
  • backend_name – The name of the backend (one of the strings ‘gtalk’, ‘hangouts’, ‘slack’ or ‘telegram’).
  • account_name – The name of the account (a string).
Returns:

A ChatArchiveBackend object.

Raises:

Exception when the backend doesn’t define a subclass of ChatArchiveBackend.

is_operator(contact)[source]

Check whether the full name of the given contact matches operator_name.

load_backend_module(backend_name)[source]

Load a chat archive backend module.

Parameters:backend_name – The name of the backend (one of the strings ‘gtalk’, ‘hangouts’, ‘slack’ or ‘telegram’).
Returns:The loaded module.
parse_account_expression(value)[source]

Parse a backend:account expression.

Parameters:value – The backend:account expression (a string).
Returns:A tuple with two values:
  1. The name of a backend (a string).
  2. The name of an account (a string, possibly empty).
search_messages(keywords)[source]

Search the chat messages in the local archive for the given keyword(s).

synchronize(*backends)[source]

Download new chat messages.

Parameters:backends – Any positional arguments limit the synchronization to backends whose name matches one of the strings provided as positional arguments.

If the name of a backend contains a colon the name is split into two:

  1. The backend name.
  2. An account name.

This way one backend can synchronize multiple named accounts into the same local database without causing confusion during synchronization about which conversations, contacts and messages belong to which account.

class chat_archive.BackendStats[source]

Statistics about chat message synchronization backends.

__init__()[source]

Initialize a BackendStats object.

__enter__()[source]

Alias for push().

__exit__(exc_type=None, exc_value=None, traceback=None)[source]

Alias for pop().

__getattr__(name)[source]

Get the value of a counter from the current scope.

__setattr__(name, value)[source]

Set the value of a counter in the current scope.

pop()[source]

Remove the inner scope and merge its counters into the outer scope.

push()[source]

Create a new inner scope with all counters reset to zero.

show()[source]

Show statistics about imported conversations, messages, contacts, etc.

scope

The current scope (a collections.defaultdict object).

chat_archive.backends

Namespace for chat archive backends.

The following chat archive backends have been implemented so far:

class chat_archive.backends.ChatArchiveBackend(**kw)[source]

Abstract base class for chat-archive backends.

When you initialize a ChatArchiveBackend object you are required to provide values for the account_name, archive, backend_name and stats properties. You can set the values of the account_name, archive, backend_name and stats properties by passing keyword arguments to the class initializer.

Here’s an overview of the ChatArchiveBackend class:

Superclass: PropertyManager
Public methods: find_contact_by_attributes(), find_contact_by_email_address(), find_contact_by_external_id(), find_contact_by_telephone_number(), get_or_create_contact(), get_or_create_conversation(), get_or_create_email_address(), get_or_create_message(), get_or_create_object(), get_or_create_telephone_number(), have_message(), pre_process_text() and synchronize()
Properties: account, account_name, archive, backend_name, config, external_id_cache, redirect_stripper, session and stats
account[source]

The Account object corresponding to account_name and backend_name.

Note

The account property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

account_name[source]

The name of the chat account that is being synchronized (a string).

The value of account_name needs to be set by the caller and is used to “get or create” the account object on demand.

Note

The account_name property is a required_property. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named account_name (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.

archive[source]

The ChatArchive that is using this backend.

Note

The archive property is a required_property. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named archive (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.

backend_name[source]

The name of the chat archive backend (a short alphanumeric string).

The value of backend_name is used to “get or create” the account object on demand.

Note

The backend_name property is a required_property. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named backend_name (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.

config[source]

The configuration options for this backend and account (a dictionary).

Note

The config property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

external_id_cache[source]

A dictionary mapping external IDs to Contact objects.

Note

The external_id_cache property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

redirect_stripper[source]

An RedirectStripper object.

Note

The redirect_stripper property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

session[source]

Shortcut for the session property of archive.

Note

The session property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

stats[source]

A BackendStats object.

Note

The stats property is a required_property. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named stats (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.

find_contact_by_attributes(attributes)[source]

Find a contact based on their external ID, an email address or a telephone number.

Parameters:attributes

A dictionary with any of the following keys:

  • external_id (string value)
  • email_addresses (list of strings)
  • telephone_numbers (list of strings)
Returns:A Contact object or None.
find_contact_by_email_address(value)[source]

Find a contact based on their email address.

Parameters:value – An email address (a string).
Returns:A Contact object or None.
find_contact_by_external_id(external_id)[source]

Find a contact based on their ‘external ID’.

Parameters:external_id – The external ID (a string).
Returns:A Contact object or None.

This method uses external_id_cache to speed up lookup of contacts by their external ID.

find_contact_by_telephone_number(value)[source]

Find a contact based on their telephone number.

Parameters:value – A telephone number (a string).
Returns:A Contact object or None.
get_or_create_contact(**attributes)[source]

Get or create a contact object.

Parameters:attributes – The names and values of model attributes, used to find existing contacts and create new ones.
Returns:A Contact object.

This method serves three distinct purposes:

  1. Finding existing contacts by their ‘external ID’ or one of their email addresses or telephone numbers.
  2. Creating new contacts (based on the given attributes).
  3. Updating existing contacts (based on the given attributes).

Here’s an overview of supported attributes:

  • The external_id attribute (whose value is expected to be string).
  • The full_name attribute (whose value is expected to be string) is split into separate first_name and last_name attributes.
  • The attributes email_address and telephone_number (whose value is expected to be string) are converted to their plural forms email_addresses and telephone_numbers (a list of strings).
get_or_create_conversation(external_id, **attributes)[source]

Get or create a Conversation object.

Parameters:
  • external_id – The external ID of the conversation (a string).
  • attributes – Any optional attributes to set when creating a new conversation.
Returns:

Refer to get_or_create_object().

get_or_create_message(conversation, **attributes)[source]

Get or create a Message object.

Parameters:
  • conversation – The Conversation in which the message originated.
  • attributes – Any optional attributes to set when creating a new message.
Returns:

Refer to get_or_create_object().

get_or_create_email_address(email_address)[source]

Get or create an EmailAddress object.

Parameters:email_address – The email address (a string).
Returns:An EmailAddress object.
get_or_create_object(model, required, optional=None)[source]

Find an existing object in the local database or create a new object.

Parameters:
  • model – The model to query.
  • required – A dictionary with the key/value pairs that should be used to search for an existing object.
  • optional – Any optional attributes to set when creating a new object.
Returns:

A tuple with two values:

  1. True if the object was created, False if it already existed.
  2. The object (an instance of model).

get_or_create_telephone_number(telephone_number)[source]

Get or create a TelephoneNumber object.

Parameters:telephone_number – The telephone number (a string containing a number).
Returns:A TelephoneNumber object.
have_message(conversation, external_id)[source]

Check if a message exists in the local database.

Parameters:
  • conversation – The Conversation that contains the message.
  • external_id – The unique id of the message (a string).
Returns:

True when the message exists, False if it doesn’t.

pre_process_text(attributes)[source]

Pre-process the text and HTML of a chat message.

Parameters:attributes – A dictionary with Message attributes.

This method works as follows:

  1. The text is pre-processed using strip_redirects().
  2. The html is pre-processed using RedirectStripper.
  3. When the resulting HTML exactly equals the plain text chat message, the html key in attributes is removed.
synchronize()[source]

This instance method must be implemented by subclasses.

chat_archive.backends.gtalk

Synchronization logic for the Google Talk backend of the chat-archive program.

The Google Talk backend uses the IMAP protocol to discover and download the messages available in the chats_folder of your Google Mail account. The following requirements need to be met in order to use this backend:

  • You need to enable IMAP access to your Google Mail account.
  • You may need to specifically enable IMAP access to the chats_folder (this turned out to be necessary for me).

Before developing this module in June 2018 I had never implemented any IMAP automation [1] so I wasn’t that familiar with the protocol and I didn’t know about message UIDs. The Unique ID in IMAP protocol blog post provided me with some useful details about the semantics of message UIDs.

This backend assumes and requires that the Google Mail servers provide message UIDs that are stable across sessions (this enables discovery of new messages). My testing implies that this is the case, because it seems to work fine! :-)

[1]Despite operating my own IMAP server for the past ten years, so I was already familiar with IMAP from the perspective of a user as well as server administrator.
chat_archive.backends.gtalk.FRIENDLY_NAME = 'Google Talk'

A user friendly name for the chat service supported by this backend (a string).

chat_archive.backends.gtalk.NAMESPACED_TAG_PATTERN = re.compile('^{[^}]+}(\\S+)$')

Compiled regular expression to match XML tag names with a name space.

chat_archive.backends.gtalk.BOGUS_EMAIL_PATTERN = re.compile('^private-chat(-[0-9a-f]+)+@groupchat.google.com$', re.IGNORECASE)

Compiled regular expression to recognize private messages in group conversations.

class chat_archive.backends.gtalk.GoogleTalkBackend(**kw)[source]

The Google Talk backend for the chat-archive program.

This backend supports the following configuration options:

Option Description
chats-folder See chats_folder.
imap-server See imap_server.
email The email address used to sign in to your Google Mail account.
password-name The name of a password in ~/.password-store to use.
password See password.

If you set password-name then password doesn’t have to be set. If password nor password-name have been set then you will be prompted for your password every time you synchronize.

You can set the values of the chats_folder and imap_server properties by passing keyword arguments to the class initializer.

Here’s an overview of the GoogleTalkBackend class:

Superclass: ChatArchiveBackend
Public methods: check_response(), contact_from_header(), contact_from_jid(), contact_from_keywords(), extract_html(), extract_timestamp(), find_conversation(), find_uids_to_download(), find_uids_to_import(), get_email_body(), login_to_server(), parse_multipart_email(), parse_singlepart_email(), parse_xml(), select_chats_folder() and synchronize()
Properties: chats_folder, client, conversation_map, imap_server and password
chats_folder[source]

The folder that contains chat message archives (a string, defaults to ‘[Gmail]/Chats’).

Note

The chats_folder property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

client[source]

An IMAP client connection to imap_server.

Note

The client property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

conversation_map[source]

A mapping of conversations.

Note

The conversation_map property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

imap_server[source]

The domain name of the Google Mail IMAP server (a string, defaults to ‘imap.gmail.com’).

Note

The imap_server property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

password[source]

The password used to sign in to the Google Mail account (a string).

Note

The password property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

synchronize()[source]

Download RFC822 encoded Google Talk conversations using IMAP and import the embedded chat messages.

login_to_server()[source]

Log-in to the Google Mail account.

select_chats_folder()[source]

Select the IMAP folder with chat messages.

find_uids_to_download()[source]

Determine the UIDs of the email messages to be downloaded.

find_uids_to_import()[source]

Determine which email messages need to be imported.

get_email_body(uid)[source]

Get the body of an email from the local cache or the server.

parse_singlepart_email(email)[source]

Extract a chat message from a single-part email downloaded from chats_folder.

parse_multipart_email(email)[source]

Find the text/xml payload in an RFC 822 multi-part email message.

parse_xml(xml_body, conversation)[source]

Extract chat messages from the text/xml payload.

find_conversation(*participants)[source]

Find a conversation (without an external ID) that involves the given participants.

extract_timestamp(message_node)[source]

Extract a timestamp from a <message> node.

Parameters:message_node – A <message> node.
Returns:A datetime.datetime object.
extract_html(message_node)[source]

Try to extract HTML from a <message> node.

Parameters:message_node – A <message> node.
Returns:The extracted HTML (a string) or None.
contact_from_jid(value)[source]

Convert a Jabber ID to an email address and use that to find or create a contact.

contact_from_keywords(keywords)[source]

Try to find a unique contact based on the given keywords.

contact_from_header(value)[source]

Get or create a contact based on the From: or To: header of an email.

check_response(response, message, *args, **kw)[source]

Validate an IMAP server response.

class chat_archive.backends.gtalk.EmailMessageParser(**kw)[source]

Lazy evaluation of email.message_from_string().

When you initialize a EmailMessageParser object you are required to provide values for the raw_body and uid properties. You can set the values of the raw_body and uid properties by passing keyword arguments to the class initializer.

Here’s an overview of the EmailMessageParser class:

Superclass: PropertyManager
Properties: parsed_body, raw_body, timestamp and uid
parsed_body[source]

The result of email.message_from_string().

Note

The parsed_body property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

raw_body[source]

The raw message body of the email (a string).

Note

The raw_body property is a required_property. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named raw_body (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.

timestamp[source]

Convert the Date: header of the email message to a datetime object.

Note

The timestamp property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

uid[source]

The UID of the email message.

Note

The uid property is a required_property. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named uid (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.

class chat_archive.backends.gtalk.LazyXMLFormatter(node)[source]

Lazy evaluation of xml.etree.ElementTree.tostring().

__init__(node)[source]

Initialize a LazyXMLFormatter object.

Parameters:node – The XML node to render.
__bytes__()[source]

Convert the XML node to a byte string.

__str__()[source]

Convert the XML node to a string.

chat_archive.backends.hangouts

Synchronization logic for the Google Hangouts backend of the chat-archive program.

chat_archive.backends.hangouts.FRIENDLY_NAME = 'Google Hangouts'

A user friendly name for the chat service supported by this backend (a string).

class chat_archive.backends.hangouts.HangoutsBackend(**kw)[source]

The Google Hangouts backend for the chat-archive program.

This backend supports the following configuration options:

Option Description
email-address The email address used to sign in to your Google account.
password-name The name of a password in ~/.password-store to use.
password The password used to sign in to your Google account.

If you set password-name then password` doesn't have to be set. If ``password nor password-name have been set then you will be prompted for your password every time you synchronize.

You can set the values of the cookie_file and retry_count properties by passing keyword arguments to the class initializer.

Here’s an overview of the HangoutsBackend class:

Superclass: ChatArchiveBackend
Public methods: connect_then_sync(), download_all_contacts(), download_all_conversations(), download_all_messages(), download_conversation(), download_message_batch(), get_message_html(), handle_import_errors(), is_bogus_user(), perform_initial_sync() and synchronize()
Properties: bogus_user_ids, client, cookie_file and retry_count
bogus_user_ids[source]

A set of strings with ‘gaia_id’ values of “bogus” users.

Note

The bogus_user_ids property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

cookie_file[source]

The pathname of the *.json file with cached credentials (a string).

Note

The cookie_file property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

client[source]

The hangups client object.

Note

The client property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

retry_count[source]

The number of times that a batch of messages will be requested (a number, defaults to 5).

Note

The retry_count property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

synchronize()[source]

Download chat contacts and messages and store them in the local archive.

download_all_contacts(user_list)[source]

Download contact details from Google Hangouts.

get_message_html(event)[source]

Get the formatted text of a chat message as HTML.

is_bogus_user(user)[source]

Ignore default / unknown users made up by hangups.

connect_then_sync()[source]

Connect to the Hangouts service and start the synchronization.

download_all_conversations(conversation_list)[source]

Download conversations from Google Hangouts.

download_all_messages(conversation, conversation_in_db, event_id=None)[source]

Download the messages in a specific Hangouts conversation.

download_conversation(conversation)[source]

Download a single Google Hangouts conversation.

download_message_batch(conversation, event_id)[source]

Try to download a batch of messages (retrying according to retry_count).

handle_import_errors(conversation, conversation_in_db, event_id=None)[source]

Download messages in a conversation, handling synchronization errors.

perform_initial_sync(conversation, conversation_in_db)[source]

Perform the initial synchronization to the start of a conversation.

class chat_archive.backends.hangouts.GoogleAccountCredentials(**kw)[source]

Used to non-interactively provide Google Account credentials to hangups.

When you initialize a GoogleAccountCredentials object you are required to provide values for the email_address and password properties. You can set the values of the email_address and password properties by passing keyword arguments to the class initializer.

Here’s an overview of the GoogleAccountCredentials class:

Superclass: PropertyManager
Public methods: get_email(), get_password() and get_verification_code()
Properties: email_address and password
email_address[source]

The Google account email address (a string).

Note

The email_address property is a required_property. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named email_address (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.

password[source]

The Google account password (a string).

Note

The password property is a required_property. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named password (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.

get_email()[source]

Feed the configured email_address to hangups.

get_password()[source]

Feed the configured password to hangups.

get_verification_code()[source]

Prompt the operator for a verification code.

chat_archive.backends.slack

Synchronization logic for the Slack backend of the chat-archive program.

chat_archive.backends.slack.FRIENDLY_NAME = 'Slack'

A user friendly name for the chat service supported by this backend (a string).

class chat_archive.backends.slack.SlackBackend(**kw)[source]

Container for the Slack chat archive backend.

You can set the value of the is_limited property by passing a keyword argument to the class initializer.

Here’s an overview of the SlackBackend class:

Superclass: ChatArchiveBackend
Public methods: expand_reference_callback(), get_history(), import_messages(), synchronize(), synchronize_channels(), synchronize_direct_messages() and synchronize_users()
Properties: api_token, client, http_session, is_limited, mrkdwn_to_html and spinner
api_token[source]

The Slack API token (a string).

Note

The api_token property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

client[source]

A slacker.Slacker instance initialized with api_token and http_session.

Note

The client property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

is_limited[source]

Whether result sets have been limited due to the free plan.

Note

The is_limited property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

mrkdwn_to_html[source]

An HTMLConverter object.

Note

The mrkdwn_to_html property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

http_session[source]

A requests.Session object used for HTTP connection re-use.

Note

The http_session property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

spinner[source]

An interactive spinner to provide feedback to the user (because the Slack backend is slow).

Note

The spinner property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

synchronize()[source]

Download chat contacts and messages and store them in the local archive.

synchronize_users()[source]

Download information about the users in the organization on Slack.

synchronize_direct_messages()[source]

Download the latest direct messages from Slack.

synchronize_channels()[source]

Download messages from named channels.

import_messages(source, conversation_in_db)[source]

Import the history of the given Slack channel.

get_history(source, channel_id, latest=None, oldest=0, page_size=100)[source]

Get the history of the given Slack channel.

expand_reference_callback(external_id)[source]

Expand a @reference to a Slack user in a chat message with the name of that user.

class chat_archive.backends.slack.HTMLConverter(expand_reference_callback=None)[source]

Convert Slack chat messages from mrkdwn format to HTML.

__init__(expand_reference_callback=None)[source]

Initialize an HTMLConverter object.

__call__(text)[source]

Convert a Slack chat message to HTML.

Parameters:text – The text of a Slack message (a string).
Returns:The generated HTML (a string).
followed_by_alphanumeric(input, index, limit)[source]

Check if the given position is followed by an alphanumeric character.

parse_bold(input, index, length, output)[source]

Parse bold text.

parse_entity(input, index, length, output)[source]

Parse an HTML entity.

parse_italic(input, index, length, output)[source]

Parse _italic_ text.

parse_preformatted(input, index, length, output)[source]

Parse pre-formatted text.

parse_preformatted_body(input, index, length, output)[source]

Parse the body of a pre-formatted text fragment.

parse_reference(input, index, length, output)[source]

Parse a reference to a URL, user or channel.

parse_strike_through(input, index, length, output)[source]

Parse ~strike-through~ text.

parse_text(input, index, length, output)[source]

Parse inline text.

preceded_by_alphanumeric(input, index)[source]

Check if the given position is preceded by an alphanumeric character.

chat_archive.backends.telegram

Synchronization logic for the Telegram backend of the chat-archive program.

The use of this backend requires the user to register on my.telegram.org/apps to get an api_id and api_hash.

chat_archive.backends.telegram.FRIENDLY_NAME = 'Telegram'

A user friendly name for the chat service supported by this backend (a string).

class chat_archive.backends.telegram.TelegramBackend(**kw)[source]

Container for the Telegram chat archive backend.

When you initialize a TelegramBackend object you are required to provide values for the api_hash and api_id properties. You can set the values of the api_hash, api_id and session_file properties by passing keyword arguments to the class initializer.

Here’s an overview of the TelegramBackend class:

Superclass: ChatArchiveBackend
Public methods: connect_then_sync(), dialog_to_ignore(), download_messages(), is_duplicate_dialog(), is_group_conversation(), is_service_dialog(), perform_initial_sync(), recipient_to_contact(), sender_to_contact(), synchronize() and update_conversation()
Properties: api_hash, api_id, client and session_file
api_hash[source]

The API hash used to connect to the Telegram API (a string).

The value of this property can be configured as follows:

[telegram]
api-hash = ...

You can use the api-hash-name configuration file option to specify the name of a secret in ~/.password-store instead.

Note

The api_hash property is a required_property. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named api_hash (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.

api_id[source]

The API ID used to connect to the Telegram API (an integer).

The value of this property can be configured as follows:

[telegram]
api-id = ...

You can use the api-id-name configuration file option to specify the name of a secret in ~/.password-store instead.

Note

The api_id property is a required_property. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named api_id (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.

client[source]

A telethon.TelegramClient object constructed based on api_id,:attr:api_hash and session_file.

Note

The client property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

session_file[source]

The filename of the session file passed to telethon.TelegramClient.

Note

The session_file property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

synchronize()[source]

Download chat contacts and messages and store them in the local archive.

dialog_to_ignore(dialog)[source]

Check if this conversation should be ignored.

This method exists to exclude two types of conversations:

  • The conversation with the “Telegram” user, because I don’t consider the service messages in this conversation to be relevant to my chat archive.
  • Group conversations that are being synchronized as part of a different Telegram account.
is_duplicate_dialog(dialog)[source]

Check if the given dialog is being synchronized as part of a different Telegram account.

is_group_conversation(dialog)[source]

Determine whether the given dialog is a group conversation.

is_service_dialog(dialog)[source]

Check if the given dialog is the dialog with the “Telegram” user, containing service messages.

connect_then_sync()[source]

Connect to the Telegram API and synchronize the available conversations.

download_messages(dialog, conversation_in_db, min_id=0, max_id=0)[source]

Download messages in the given conversation.

perform_initial_sync(dialog, conversation_in_db)[source]

Start or resume the initial synchronization.

update_conversation(dialog, conversation_in_db)[source]

Download new messages in an existing conversation.

sender_to_contact(user)[source]

Create a contact in our local database for the given Telegram user.

recipient_to_contact(to_id)[source]

Create a contact in our local database for the given to_id value.

chat_archive.cli

Usage: chat-archive [OPTIONS] [COMMAND]

Easy to use offline chat archive that can gather chat message history from Google Talk, Google Hangouts, Slack and Telegram.

Supported commands:

  • The ‘sync’ command downloads new chat messages from supported chat services and stores them in the local archive (an SQLite database).
  • The ‘search’ command searches the chat messages in the local archive for the given keyword(s) and lists matching messages.
  • The ‘list’ command lists all messages in the local archive.
  • The ‘stats’ command shows statistics about the local archive.
  • The ‘unknown’ command searches for conversations that contain messages from an unknown sender and allows you to enter the name of a new contact to associate with all of the messages from an unknown sender. Conversations involving multiple unknown sender are not supported.

Supported options:

Option Description
-C, --context=COUNT Print COUNT messages of output context during ‘chat-archive search’. This works similarly to ‘grep -C’. The default value of COUNT is 3.
-f, --force Retry synchronization of conversations where errors were previously encountered. This option is currently only relevant to the Google Hangouts backend, because I kept getting server errors when synchronizing a few specific conversations and I didn’t want to keep seeing each of those errors during every synchronization run :-).
-c, --color=CHOICE, --colour=CHOICE

Specify whether ANSI escape sequences for text and background colors and text styles are to be used or not, depending on the value of CHOICE:

  • The values ‘always’, ‘true’, ‘yes’ and ‘1’ enable colors.
  • The values ‘never’, ‘false’, ‘no’ and ‘0’ disable colors.
  • When the value is ‘auto’ (this is the default) then colors will only be enabled when an interactive terminal is detected.
-l, --log-file=LOGFILE Save logs at DEBUG verbosity to the filename given by LOGFILE. This option was added to make it easy to capture the log output of an initial synchronization that will be downloading thousands of messages.
-p, --profile=FILENAME Enable profiling of the chat-archive application to make it possible to analyze performance problems. Python profiling data will be saved to FILENAME every time database changes are committed (making it possible to inspect the profile while the program is still running).
-v, --verbose Increase logging verbosity (can be repeated).
-q, --quiet Decrease logging verbosity (can be repeated).
-h, --help Show this message and exit.
chat_archive.cli.FORMATTING_TEMPLATES = {'conversation_delimiter': '<span style="color: green">{text}</span>', 'conversation_name': '<span style="font-weight: bold; color: #FCE94F">{text}</span>', 'keyword_highlight': '<span style="color: black; background-color: yellow">{text}</span>', 'message_backend': '<span style="color: #C4A000">({text})</span>', 'message_contacts': '<span style="color: blue">{text}</span>', 'message_delimiter': '<span style="color: #555753">{text}</span>', 'message_timestamp': '<span style="color: green">{text}</span>'}

The formatting of output, specified as HTML with placeholders.

chat_archive.cli.UNKNOWN_CONTACT_LABEL = 'Unknown'

The label for contacts without a name or email address (a string).

chat_archive.cli.main()[source]

Command line interface for the chat-archive program.

class chat_archive.cli.UserInterface(*args, **kw)[source]

The Python API for the command line interface for the chat-archive program.

You can set the values of the context, keywords, timestamp_format and use_colors properties by passing keyword arguments to the class initializer.

Here’s an overview of the UserInterface class:

Superclass: ChatArchive
Public methods: gather_context(), generate_html(), get_contact_name(), list_cmd(), normalize_whitespace(), prepare_output(), render_backend(), render_contacts(), render_conversation_summary(), render_messages(), render_output(), render_text(), render_timestamp(), search_cmd(), stats_cmd(), sync_cmd() and unknown_cmd()
Properties: context, html_to_ansi, html_to_text, keyword_highlighter, keywords, redirect_stripper, timestamp_format and use_colors
context[source]

The number of messages of output context to print during searches (defaults to 3).

Note

The context property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

use_colors[source]

Whether to output ANSI escape sequences for text colors and styles (a boolean).

Note

The use_colors property is a custom_property. You can change the value of this property using normal attribute assignment syntax. This property’s value is computed once (the first time it is accessed) and the result is cached. To clear the cached value you can use del or delattr().

html_to_ansi[source]

An HTMLConverter object that uses normalize_emoji() as a text pre-processing callback.

Note

The html_to_ansi property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

redirect_stripper[source]

An RedirectStripper object.

Note

The redirect_stripper property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

html_to_text[source]

An HTMLStripper object.

Note

The html_to_text property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

keyword_highlighter[source]

A KeywordHighlighter object based on keywords.

Note

The keyword_highlighter property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

keywords[source]

A list of strings with search keywords.

Note

The keywords property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

timestamp_format[source]

The format of timestamps (defaults to %Y-%m-%d %H:%M:%S).

Note

The timestamp_format property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

list_cmd(arguments)[source]

List all messages in the local archive.

search_cmd(arguments)[source]

Search the chat messages in the local archive for the given keyword(s).

stats_cmd(arguments)[source]

Show some statistics about the local chat archive.

sync_cmd(arguments)[source]

Download new chat messages from the supported services.

unknown_cmd(arguments)[source]

Find private conversations with messages from an unknown sender and interactively prompt the operator to provide a name for a new contact to associate the messages with.

generate_html(name, text)[source]

Generate HTML based on a named format string.

Parameters:
  • name – The name of an HTML format string in FORMATTING_TEMPLATES (a string).
  • text – The text to interpolate (a string).
Returns:

The generated HTML (a string).

This method does not escape the text given to it, in other words it is up to the caller to decide whether embedded HTML is allowed or not.

gather_context(messages)[source]

Enhance search results with context (surrounding messages).

render_messages(messages)[source]

Render the given message(s) on the terminal.

normalize_whitespace(text)[source]

Normalize the whitespace in a chat message before rendering on the terminal.

Parameters:text – The chat message text (a string).
Returns:The normalized text (a string).

This method works as follows:

  • First leading and trailing whitespace is stripped from the text.
  • When the resulting text consists of a single line, it is processed using compact() and returned.
  • When the resulting text contains multiple lines the text is prefixed with a newline character, so that the chat message starts on its own line. This ensures that messages requiring vertical alignment render properly (for example a table drawn with | and - characters).
render_conversation_summary(conversation)[source]

Render a summary of which conversation a message is part of.

render_contacts(message)[source]

Render a human friendly representation of a message’s contact(s).

prepare_output(text)[source]

Prepare text for rendering on the terminal.

Parameters:text – The HTML text to render (a string).
Returns:The rendered text (a string).

When use_colors is True this method first uses keyword_highlighter to highlight search matches in the given text and then it converts the string from HTML to ANSI escape sequences using html_to_ansi.

When use_colors is False then html_to_text is used to convert the given HTML to plain text. In this case keyword highlighting is skipped.

render_output(text)[source]

Render text on the terminal.

Parameters:text – The HTML text to render (a string).

Refer to prepare_output() for details about how text is converted from HTML to text with ANSI escape sequences.

get_contact_name(contact)[source]

Get a short string describing a contact (preferably their first name, but if that is not available then their email address will have to do). If no useful information is available UNKNOWN_CONTACT_LABEL is returned so as to explicitly mark the absence of more information.

render_text(message)[source]

Prepare the text of a chat message for rendering on the terminal.

render_timestamp(value)[source]

Render a human friendly representation of a timestamp.

render_backend(value)[source]

Render a human friendly representation of a chat message backend.

chat_archive.database

SQLAlchemy based database helpers.

class chat_archive.database.DatabaseClient(*args, **kw)[source]

Simple wrapper for SQLAlchemy that makes it easy to use with SQLite.

When you initialize a DatabaseClient object you are required to provide a value for the database_url property. You can set the values of the database_file, database_url and echo_queries properties by passing keyword arguments to the class initializer.

Here’s an overview of the DatabaseClient class:

Superclass: ProfileManager
Special methods: __exit__() and __init__()
Public methods: commit_changes()
Properties: database_engine, database_file, database_url, echo_queries, session and session_factory
__init__(*args, **kw)[source]

Initialize a DatabaseClient object.

Please refer to the PropertyManager documentation for details about the handling of arguments.

database_engine[source]

An SQLAlchemy database engine connected to database_url.

Note

The database_engine property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

database_file[source]

The absolute pathname of an SQLite database file (a string or None).

Note

The database_file property is a writable_property. You can change the value of this property using normal attribute assignment syntax.

database_url[source]

A URL that indicates the database dialect and connection arguments to SQLAlchemy (a string).

The value of database_url defaults to a URL that instructs SQLAlchemy to use an SQLite 3 database file located at the pathname given by database_file, but of course you are free to point SQLAlchemy to any supported database server.

Note

The database_url property is a required_property. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named database_url (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.

echo_queries[source]

Whether queries should be logged to sys.stderr (a boolean, defaults to False).

Note

The echo_queries property is a writable_property. You can change the value of this property using normal attribute assignment syntax.

session[source]

An SQLAlchemy session created by session_factory.

Note

The session property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

session_factory[source]

An SQLAlchemy session factory connected to database_engine.

Note

The session_factory property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

__exit__(exc_type=None, exc_value=None, traceback=None)[source]

Automatically commit database changes when the with block ends.

commit_changes()[source]

Commit database changes to disk.

class chat_archive.database.SchemaManager(*args, **kw)[source]

Easy to use database schema upgrades based on Alembic.

You can set the values of the alembic_directory, auto_create_schema, auto_upgrade_schema and declarative_base properties by passing keyword arguments to the class initializer.

Here’s an overview of the SchemaManager class:

Superclass: DatabaseClient
Special methods: __init__()
Public methods: initialize_schema() and run_migrations()
Properties: alembic_config, alembic_directory, auto_create_schema, auto_upgrade_schema, current_schema_revision, declarative_base, latest_schema_revision and schema_up_to_date
__init__(*args, **kw)[source]

Initialize a SchemaManager object.

This method automatically calls run_migrations() (and initialize_schema() when the database is initially created) to ensure that the database schema is up to date.

alembic_config[source]

A minimal Alembic configuration object.

This configuration objects contains two options:

Raises:ValueError when alembic_directory isn’t set.

Note

The alembic_config property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

alembic_directory[source]

The absolute pathname of the directory containing Alembic’s env.py file (a string or None).

Note

The alembic_directory property is a writable_property. You can change the value of this property using normal attribute assignment syntax.

auto_create_schema[source]

True if automatic database schema upgrades are enabled, False otherwise.

This defaults to True when declarative_base is set, False otherwise.

Note

The auto_create_schema property is a writable_property. You can change the value of this property using normal attribute assignment syntax.

auto_upgrade_schema[source]

True if automatic database schema initialization is enabled, False otherwise.

This defaults to True when alembic_directory is set, False otherwise.

Note

The auto_upgrade_schema property is a writable_property. You can change the value of this property using normal attribute assignment syntax.

current_schema_revision[source]

The current database schema revision in the database that we’re connected to (a string or None).

Note

The current_schema_revision property is a cached_property. This property’s value is computed once (the first time it is accessed) and the result is cached. To clear the cached value you can use del or delattr().

declarative_base[source]

The base class for declarative models defined using SQLAlchemy.

Note

The declarative_base property is a writable_property. You can change the value of this property using normal attribute assignment syntax.

latest_schema_revision[source]

The current schema revision according to Alembic’s migration scripts (a string).

Note

The latest_schema_revision property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

schema_up_to_date

True if the database schema is up to date, False otherwise.

initialize_schema()[source]

Initialize the database schema using SQLAlchemy.

This method is automatically called when a SchemaManager object is created. In order to initialize the database schema the declarative_base property needs to be set, but if it’s not set then initialize_schema() won’t complain.

run_migrations()[source]

Upgrade the database schema using Alembic.

This method is automatically called when a SchemaManager object is created. In order to upgrade the database schema the alembic_directory property needs to be set, but if it’s not set then run_migrations() won’t complain.

class chat_archive.database.CustomVerbosity(**kw)[source]

Easily customize logging verbosity for a given scope.

This is used by SchemaManager to silence Alembic because it’s rather verbose by default, presumably because its primary purpose is to be a command line program and not a library embedded in an application.

When you initialize a CustomVerbosity object you are required to provide a value for the level property. You can set the values of the level and original_level properties by passing keyword arguments to the class initializer.

Here’s an overview of the CustomVerbosity class:

Superclass: PropertyManager
Special methods: __enter__() and __exit__()
Properties: level and original_level
level[source]

The overridden logging verbosity level.

Note

The level property is a required_property. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named level (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.

original_level[source]

The original logging verbosity level.

Note

The original_level property is a writable_property. You can change the value of this property using normal attribute assignment syntax.

__enter__()[source]

Customize the logging verbosity when entering the with block.

__exit__(exc_type=None, exc_value=None, traceback=None)[source]

Restore the original logging verbosity when leaving the with block.

chat_archive.emoji

Utility functions to translate between various forms of smilies and emoji.

chat_archive.emoji.normalize_emoji(text)[source]

Translate textual smilies, hollow smilies and macros to color emoji.

chat_archive.html

Utility functions for working with the HTML encoded text.

chat_archive.html.BLOCK_TAGS = ['div', 'p', 'pre']

A list of strings with HTML tags that are considered block-level elements. The HTMLStripper emits an empty line before and after each block-level element that it encounters.

chat_archive.html.URL_PATTERN = re.compile('(http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+)')

A compiled regular expression pattern to find URLs in text (credit: taken from urlregex.com).

chat_archive.html.html_to_text(html_text)[source]

Convert HTML to plain text.

Parameters:html_text – A fragment of HTML (a string).
Returns:The plain text (a string).

This function uses the HTMLStripper class that builds on top of the html.parser.HTMLParser class in the Python standard library.

chat_archive.html.text_to_html(text, callback=None)[source]

Convert plain text to HTML.

Parameters:
  • text – A fragment of plain text (a string).
  • callback – An optional callback that provides the caller a chance to pre-process text before it is encoded as HTML.
Returns:

The HTML encoded text (a string).

This function replaces URLs with <a href="..."> tags and escapes special characters, that’s it, nothing more.

class chat_archive.html.HTMLStripper(*, convert_charrefs=True)[source]

A simple HTML to text converter based on html.parser.HTMLParser.

__call__(data)[source]

Convert HTML to text.

Parameters:data – The HTML to convert to text (a string).
Returns:The converted text (a string).

This method calls compact_empty_lines() on the converted text to normalize superfluous empty lines caused by vertical whitespace emitted around block level elements like <div>, <p> and <pre>.

handle_charref(value)[source]

Process a decimal or hexadecimal numeric character reference.

Parameters:value – The decimal or hexadecimal value (a string).
handle_data(data)[source]

Capture decoded text data.

handle_endtag(tag)[source]

Emit empty lines around block level elements.

handle_entityref(name)[source]

Process a named character reference.

Parameters:name – The name of the character reference (a string).
handle_starttag(tag, attrs)[source]

Translate <br> tags to line breaks.

reset()[source]

Reset the state of the HTMLStripper instance.

chat_archive.html.keywords

Utility functions for working with the HTML encoded text.

class chat_archive.html.keywords.KeywordHighlighter(*args, **kw)[source]

A simple keyword highlighter for HTML based on html.parser.HTMLParser.

__init__(*args, **kw)[source]

Initialize a KeywordHighlighter object.

Parameters:
  • keywords – A list of strings with keywords to highlight.
  • highlight_template – A template string with the {text} placeholder that’s used to highlight keyword matches.
__call__(data)[source]

Highlight keywords in the given HTML fragment.

Parameters:data – The HTML in which to highlight keywords (a string).
Returns:The highlighted HTML (a string).
handle_charref(value)[source]

Process a numeric character reference.

handle_data(data)[source]

Process textual data.

handle_endtag(tag)[source]

Process an end tag.

handle_entityref(name)[source]

Process a named character reference.

handle_starttag(tag, attrs)[source]

Process a start tag.

handle_startendtag(tag, attrs)[source]

Process a start tag without end tag.

render_attrs(attrs)[source]

Process the attributes of a tag.

reset()[source]

Reset the state of the keyword highlighter.

Clears the output buffer but preserves the keywords to be highlighted. This method is called implicitly during initialization.

chat_archive.html.redirects

Utility functions to pre-process URLs before rendering on a terminal.

In web browsers and chat clients the URLs behind hyperlinks are usually hidden, but in a terminal there’s no “out of band” mechanism to communicate the URL behind a hyperlink - the URL needs to appear literally in the text that is rendered to the terminal.

Given this requirement, I’ve become rather annoyed at Google prefixing every URL they can get their hands on with https://www.google.com/url?q=… because this user hostile “encoding” obscures the intended URL with a lot of fluff that I don’t care for.

This module contains the expand_url() function to transform redirect URLs into their target URL, the strip_redirects() function to transform all redirect URLs in a given text and RedirectStripper to transform all redirect URLs in a given HTML fragment.

chat_archive.html.redirects.GOOGLE_REDIRECT_URL = 'www.google.com/url'

The base URL of the Google redirect service (a string).

Note that the URL scheme is omitted on purpose, to enable a substring search for the Google redirect service regardless of whether a given URL is using the http:// or https:// scheme.

chat_archive.html.redirects.URL_PATTERN = re.compile('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+')

A compiled regular expression pattern to find URLs in text (credit: taken from urlregex.com).

chat_archive.html.redirects.expand_url(url)[source]

Expand a redirect URL to its target URL.

Parameters:url – The URL to expand (a string).
Returns:The expanded URL (a string).
chat_archive.html.redirects.strip_redirects(text)[source]

Expand redirect URLs in the given text.

Parameters:text – The text to process (a string).
Returns:The processed text (a string).
chat_archive.html.redirects.strip_redirects_callback(match)[source]

Apply expand_url() to the matched URL.

class chat_archive.html.redirects.RedirectStripper(*, convert_charrefs=True)[source]

Expand redirect URLs embedded in HTML.

This class uses html.parser.HTMLParser to parse HTML and expand any redirect URLs that it encounters to their target URL. The __call__() method provides an easy way to use this functionality.

__call__(data)[source]

Pre-process the URLs in the given HTML fragment.

Parameters:data – The HTML to pre-process (a string).
Returns:The pre-processed HTML (a string).
handle_charref(value)[source]

Process a numeric character reference.

handle_data(data)[source]

Process textual data.

handle_endtag(tag)[source]

Process an end tag.

handle_entityref(name)[source]

Process a named character reference.

handle_starttag(tag, attrs)[source]

Process a start tag.

handle_startendtag(tag, attrs)[source]

Process a start tag without end tag.

render_tag(tag, attrs, close)[source]

Process the attributes of a tag.

reset()[source]

Reset the state of the keyword highlighter.

Clears the output buffer but preserves the keywords to be highlighted. This method is called implicitly during initialization.

chat_archive.models

Database models for the chat-archive program based on SQLAlchemy.

The chat_archive.models module defines the following database models for the chat-archive program:

chat_archive.models.metadata = MetaData(bind=None)

Define an explicit naming convention to simplify future database migrations.

class chat_archive.models.Base(**kwargs)

The most base type

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

chat_archive.models.address_mapping = Table('email_address_mapping', MetaData(bind=None), Column('contact_id', Integer(), ForeignKey('contacts.id'), table=<email_address_mapping>), Column('address_id', Integer(), ForeignKey('email_addresses.id'), table=<email_address_mapping>), schema=None)

Mapping table for many-to-many relationship between contacts and email addresses.

chat_archive.models.telephone_number_mapping = Table('telephone_number_mapping', MetaData(bind=None), Column('contact_id', Integer(), ForeignKey('contacts.id'), table=<telephone_number_mapping>), Column('telephone_number_id', Integer(), ForeignKey('telephone_numbers.id'), table=<telephone_number_mapping>), schema=None)

Mapping table for many-to-many relationship between contacts and telephone numbers.

class chat_archive.models.Account(**kwargs)[source]

Database model for chat accounts.

id

The primary key of the account (an integer).

backend

The name of the backend that manages this account (a string).

name

A user defined name for the account (a string).

contacts

The contacts that have been imported using this account.

conversations

The conversations that have been imported using this account.

name_is_significant

True if the database contains multiple accounts with this backend, False otherwise.

__repr__()[source]

Render a human friendly representation of an Account object.

__str__()[source]

Render a human friendly representation of an Account object.

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class chat_archive.models.EmailAddress(**kwargs)[source]

Database model for email addresses of chat contacts.

id

The primary key of the email address (an integer).

value

The email address itself (a string).

__repr__()[source]

Render a human friendly representation of an EmailAddress object.

__str__()[source]

Render a human friendly representation of an EmailAddress object.

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class chat_archive.models.TelephoneNumber(**kwargs)[source]

Database model for telephone numbers of chat contacts.

id

The primary key of the telephone number (an integer).

value

The telephone number itself (a string).

__repr__()[source]

Render a human friendly representation of an TelephoneNumber object.

__str__()[source]

Render a human friendly representation of an TelephoneNumber object.

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class chat_archive.models.Contact(**kwargs)[source]

Database model for chat contacts.

id

The primary key of the contact (an integer).

account_id

A foreign key to associate contacts with accounts.

external_id

An optional backend specific identifier for contacts (an opaque string or None).

first_name

The contact’s first name (a string or None).

last_name

The contact’s last name (a string or None).

account

The account that this contact belongs to (an Account object).

email_addresses

The email addresses of this contact.

telephone_numbers

The telephone numbers of this contact.

sent_messages

The chat messages that were sent by this contact.

received_messages

The chat messages that were received by this contact.

first_name_is_unambiguous

True if this first name unambiguously refers to a single contact, False otherwise.

full_name

The full name of the contact (as an SQL expression).

unambiguous_name

The shortest unambiguous name of the contact (a string or None).

__repr__()[source]

Render a human friendly representation of a Contact object.

__str__()[source]

Render a human friendly representation of a Contact object.

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class chat_archive.models.Conversation(**kwargs)[source]

Database model for chat conversations.

id

The primary key of the conversation (an integer).

account_id

A foreign key to associate conversations with accounts.

external_id

An optional backend specific identifier for conversations (an opaque string or None).

name

An optional name for the conversation (a string or None).

last_modified

The time when the conversation was last modified (a datetime value or None).

import_complete

Whether the full conversation has been imported (a boolean, defaults to False).

import_errors

Whether errors were encountered during the import (a boolean, defaults to False).

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

is_group_conversation

Whether the conversation is a group conversation (a boolean, defaults to False).

account

The account that this conversation belongs to (an Account object).

messages

The chat messages that belong to this conversation.

have_unknown_senders

Whether this conversation includes messages from unknown senders (a boolean).

newest_message

The newest message in the conversation (a Message object or None).

oldest_message

The oldest message in the conversation (a Message object or None).

participants

The Contact objects that have participated in this conversation.

delete_messages()[source]

Delete existing chat messages in the conversation.

__str__()[source]

Render a human friendly representation of a Contact object.

class chat_archive.models.Message(**kwargs)[source]

Database model for chat messages.

Note that the Message model doesn’t have a direct relationship to the Account model because these two models already have an indirect relationship via the Conversation model (in other words, messages are implicitly namespaced to accounts via conversations).

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

id

The primary key of the chat message (an integer).

external_id

An optional backend specific identifier for chat messages (an opaque string or None).

timestamp

The timestamp of the chat message (a datetime value).

conversation_id

A foreign key to associate chat messages with conversations.

sender_id

A foreign key that points to the contact who sent this message (an integer or None).

recipient_id

A foreign key that points to the contact who received this message (an integer or None).

raw

The raw message text in a backend specific format (a string or None).

The reason that this field was added to the database schema is because the Slack backend emits chat messages in the somewhat peculiar mrkdwn format which is “almost but not quite” human readable (in my opinion). When the Slack backend imports a new message, the following steps take place:

  1. The original message text is stored without any modifications in the raw column.

  2. A custom mrkdwn parser developed for the chat-archive program is used to convert raw to html (during the import).

  3. The value of html is used to generate the value of text (during the import).

    If this surprises you: I could have developed a second mrkdwn converter with a different output format, but that’s 150 lines of code I don’t care to repeat and html_to_text() works fine for this purpose 😇.

If the custom mrkdwn parser (which is bound to contain bugs) receives bug fixes in a new release of the chat-archive program then raw values can be used to regenerate text and html values.

text

The human readable plain text of the chat message (a string).

This field cannot be None (NULL) and is expected to always contain a nonempty chat message text. This field is used during searches and when chat-archive --colors=never is run.

html

The formatted text of the chat message (a string or None).

When a chat message doesn’t contain text formatting or hyperlinks html will be None and text should be used instead. This field will be used when chat-archive --color=yes is run.

conversation

The conversation that this chat message took place in (a Conversation object or None).

sender

The contact that sent the message (a Contact object or None).

recipient

The contact that received the message (a Contact object or None).

newer_messages

Newer messages in the conversation (not yet sorted!).

next_message

The next message in the conversation (or None).

older_messages

Older messages in the conversation (not yet sorted!).

previous_message

The previous message in the conversation (or None).

find_distance(other_message)[source]

Compute the distance between two messages.

__repr__()[source]

Render a human friendly representation of a Message object.

__str__()[source]

Render a human friendly representation of a Message object.

chat_archive.profiling

Easy to use Python code profiling support.

class chat_archive.profiling.ProfileManager(*args, **kw)[source]

Base class for easy to use Python code profiling support.

This class makes it easy to enable and disable Python code profiling and save the results to a file. You can use it in a with statement to guarantee that the profile is saved even when your program is interrupted with Control-C, so when your program is too slow and you’re wondering why you can just restart the program with profiling enabled, wait for it to get slow, give it a while to collect profile statistics and then interrupt it with Control-C.

When profile_file is set the class initializer method will automatically call enable_profiling().

You can set the values of the profile_file, profiler and profiling_enabled properties by passing keyword arguments to the class initializer.

Here’s an overview of the ProfileManager class:

Superclass: PropertyManager
Special methods: __enter__(), __exit__() and __init__()
Public methods: disable_profiling(), enable_profiling() and save_profile()
Properties: can_save_profile, profile_file, profiler and profiling_enabled
__init__(*args, **kw)[source]

Initialize a ProfileManager object.

Please refer to the PropertyManager documentation for details about the handling of arguments.

__enter__()[source]

Automatically enable code profiling when the with block starts.

__exit__(exc_type=None, exc_value=None, traceback=None)[source]

Disable code profiling and save the profile statistics when the with block ends.

can_save_profile

True if save_profile() is expected to work, False otherwise.

profile_file[source]

The pathname of a file where Python profile statistics should be saved (a string or None).

Note

The profile_file property is a writable_property. You can change the value of this property using normal attribute assignment syntax.

profiler[source]

A profile.Profile object (if profile_file is set) or None.

Note

The profiler property is a writable_property. You can change the value of this property using normal attribute assignment syntax.

profiling_enabled[source]

True if code profiling is enabled, False otherwise.

Note

The profiling_enabled property is a writable_property. You can change the value of this property using normal attribute assignment syntax.

enable_profiling()[source]

Enable Python code profiling.

disable_profiling()[source]

Disable Python code profiling.

save_profile(filename=None)[source]

Save gathered profile statistics to a file.

Parameters:filename – The pathname of the profile file (a string or None). Defaults to the value of profile_file.
Raises:ValueError when profiling was never enabled or filename isn’t given and profile_file also isn’t set.

chat_archive.utils

Utility functions for the chat-archive program.

chat_archive.utils.ensure_directory_exists(pathname)[source]

Create a directory if it doesn’t exist yet.

Parameters:pathname – The pathname of the directory (a string).
chat_archive.utils.get_full_name()[source]

Find the full name of the current user on the local system based on /etc/passwd.

Returns:A string with the full name of the current user or an empty string when this information is not available.
chat_archive.utils.get_secret(options, value_option, name_option, description)[source]

Get a secret needed to connect to a chat service (like a password or API token).

Parameters:
  • options – A dictionary with configuration options.
  • value_option – The name of the configuration option that defines the value of a secret (a string).
  • name_option – The name of the configuration option that defines the name of a secret in ~/.password-store (a string). See also get_secret_from_store().
  • description – A description of the type of secret that the operator will be prompted for (a string).
Returns:

The password (a string).

chat_archive.utils.get_secret_from_store(name, directory=None)[source]

Use qpass to get a secret from ~/.password-store.

Parameters:
  • name – The name of a password or a search pattern that matches a single entry in the password store (a string).
  • directory – The directory to use (a string, defaults to ~/.password-store).
Returns:

The secret (a string).

Raises:

exceptions.ValueError when the given name doesn’t match any entries or matches multiple entries in the password store.

chat_archive.utils.prompt_for_password(prompt_text)[source]

Interactively prompt the operator for a password.

chat_archive.utils.utc_to_local(utc_value)[source]

Convert a UTC datetime object to the local timezone.