Changes in 2.0

A port of the Ruby gem twitter-text-rb to Python.

Changes in 2.0

See the pull request for details.

Usage

You can either call a new TwitterText object with the text of the tweet you want to process TwitterText('twitter-text-py is #awesome') or use any of the submodule objects directly (Autolink, Extractor, HitHighlighter or Validation), passing in the tweet text as an argument.

The library also contains a Django template filter that applies the auto_link method to the passed in text. It can also optionally apply the hit_highlight method. Example:

{% load twitterize %}

{{ obj.body|twitter_text }} <!-- just add the links -->
{{ obj.body|twitter_text:"my term" }} <!-- add the links and highlight the search term -->

You can test that the library is working correctly by running python tests.py inside the twitter_text directory.

TwitterText(text)

Properties:

text: the original text you passed in, or the modified version if you've called any functions on the object.
original_text: the original text you passed in; never modified. Useful for a fallback or to do comparisons.
has_been_linked: boolean denoting if any of the Autolink functions have been called. (Mostly for internal use.)
tweet_length: the value returned by validation.tweet_length or None if that function has not yet been called.
tweet_is_valid: boolean returned by validation.tweet_invalid or None if that function has not yet been called.
validation_error: the validation error string returned by validation.tweet_invalid or None if that function has not yet been called.
autolink: property pointing to an Autolink object initialized with text
extractor: property pointing to an Extractor object initialized with text
highlighter: property pointing to a HitHighlighter object initialized with text
validation: property pointing to a Validation object initialized with text

Autolink(text)

This object modifies the text passed to it (and the parent TwitterText.text if present).

Defaults

These may be overridden by kwargs on a particular method.

url_class = 'tweet-url'
list_class = 'list-slug'
username_class = 'username'
hashtag_class = 'hashtag'

Methods:

auto_link(self, **kwargs)

Add <a></a> tags around the usernames, lists, hashtags and URLs in the provided text. The <a> tags can be controlled with the following kwargs:

url_class: class to add to all <a> tags
list_class: class to add to list <a> tags
username_class: class to add to username <a> tags
hashtag_class: class to add to hashtag <a> tags
username_url_base: the value for href attribute on username links. The @username (minus the @) will be appended at the end of this.
list_url_base: the value for href attribute on list links. The @username/list (minus the @) will be appended at the end of this.
hashtag_url_base: the value for href attribute on hashtag links. The #hashtag (minus the #) will be appended at the end of this.
suppress_lists: disable auto-linking to lists
suppress_no_follow: do not add rel="nofollow" to auto-linked items
html_attrs: a dictionary of HTML attributes to add to non-Twitter links

auto_link_usernames_or_lists(self, **kwargs)

Add <a></a> tags around the usernames and lists in the provided text. The <a> tags can be controlled with the following kwargs:

url_class: class to add to all <a> tags
list_class: class to add to list <a> tags
username_class: class to add to username <a> tags
username_url_base: the value for href attribute on username links. The @username (minus the @) will be appended at the end of this.
list_url_base: the value for href attribute on list links. The @username/list (minus the @) will be appended at the end of this.
suppress_lists: disable auto-linking to lists
suppress_no_follow: do not add rel="nofollow" to auto-linked items

auto_link_hashtags(self, **kwargs)

Add <a></a> tags around the hashtags in the provided text. The <a> tags can be controlled with the following kwargs:

url_class: class to add to all <a> tags
hashtag_class: class to add to hashtag <a> tags
hashtag_url_base: the value for href attribute. The hashtag text (minus the #) will be appended at the end of this.
suppress_no_follow: do not add rel="nofollow" to auto-linked items

auto_link_urls_custom(self, **kwargs)

Add <a></a> tags around the URLs in the provided text. Any elements in kwargs (except @supress_no_follow@) will be converted to HTML attributes and place in the <a> tag. Unless kwargs contains @suppress_no_follow@ the rel="nofollow" attribute will be added.

Extractor

This object does not modify the text passed to it (or the parent TwitterText.text if present).

Methods

extract_mentioned_screen_names

Extracts a list of all usernames mentioned in the Tweet text. If the text contains no username mentions an empty list will be returned.

If a transform is given, then it will be called with each username.

extract_mentioned_screen_names_with_indices

Extracts a list of all usernames mentioned in the Tweet text along with the indices for where the mention occurred in the format:

{
    'screen_name': username_string,
    'indicies': ( start_postion, end_position )
}

If the text contains no username mentions, an empty list will be returned.

If a transform is given, then it will be called with each username, the start index, and the end index in the text.

extract_reply_screen_name

Extracts the first username replied to in the Tweet text. If the text does not contain a reply None will be returned.

If a transform is given then it will be called with the username replied to (if any).

extract_urls

Extracts a list of all URLs included in the Tweet text. If the text contains no URLs an empty list will be returned.

If a transform is given then it will be called for each URL.

extract_urls_with_indices

Extracts a list of all URLs included in the Tweet text along with the indices in the format:

{
    'url': url_string,
    'indices': ( start_postion, end_position )
}

If the text contains no URLs an empty list will be returned.

If a transform is given then it will be called for each URL, the start index, and the end index in the text.

extract_hashtags

Extracts a list of all hashtags included in the Tweet text. If the text contains no hashtags an empty list will be returned. The list returned will not include the leading # character.

If a transform is given then it will be called for each hashtag.

extract_hashtags_with_indices

Extracts a list of all hashtags included in the Tweet text along with the indices in the format:

{
    'hashtag': hashtag_text,
    'indices': ( start_postion, end_position )
}

If the text contains no hashtags an empty list will be returned. The list returned will not include the leading # character.

If a transform is given then it will be called for each hashtag.

HitHighlighter

Defaults

These may be overridden by kwargs on a particular method.

highlight_tag = 'em'
highlight_class = 'search-hit'

Methods

hit_highlight(self, query, **kwargs)

Add <em></em> tags around occurrences of query provided in the text except for occurrences inside hashtags.

The <em></em> tags or css class can be overridden using the highlight_tag and/or highlight_class kwarg. For example:

python> HitHighlighter.hit_highlight('test hit here').hit_highlight('hit', highlight_tag = 'strong', highlight_class = 'search-term')
        =\> "test <strong class='search-term'>hit</strong> here"

Validation

Methods

tweet_length

Returns the length of the string as it would be displayed. This is equivilent to the length of the Unicode NFC (See: http://www.unicode.org/reports/tr15). This is needed in order to consistently calculate the length of a string no matter which actual form was transmitted. For example:

U+0065 Latin Small Letter E
+ U+0301 Combining Acute Accent
----------
= 2 bytes, 2 characters, displayed as é (1 visual glyph)

The NFC of {U+0065, U+0301} is {U+00E9}, which is a single character and a display length of 1

The string could also contain U+00E9 already, in which case the canonicalization will not change the value.

tweet_invalid

Check the text for any reason that it may not be valid as a Tweet. This is meant as a pre-validation before posting to api.twitter.com. There are several server-side reasons for Tweets to fail but this pre-validation will allow quicker feedback.

Returns false if this text is valid. Otherwise one of the following Symbols will be returned:

"Too long": if the text is too long
"Empty text": if the text is empty
"Invalid characters": if the text contains non-Unicode or any of the disallowed Unicode characters

Name	Name	Last commit message	Last commit date
Latest commit dryan Merge pull request #27 from abhillman/link_attribute_fix May 6, 2014 143ee74 · May 6, 2014 History 127 Commits
twitter-text-conformance @ 9b58c44	twitter-text-conformance @ 9b58c44	rolling back twitter-text-conformance to before the latest test was a…	May 15, 2013
twitter_text	twitter_text	Merge pull request #27 from abhillman/link_attribute_fix	May 6, 2014
.gitignore	.gitignore	.gitignore	May 13, 2013
.gitmodules	.gitmodules	switching to dryan's fork of twitter-text-conformance since some of t…	May 14, 2013
.travis.yml	.travis.yml	turn off email notifications for travis CI	May 14, 2013
LICENSE	LICENSE	added LICENSE	Jun 27, 2010
MANIFEST.in	MANIFEST.in	Adding MANIFEST file.	May 12, 2011
README.md	README.md	fixing incorrect documentation; closes #23	Apr 14, 2014
__init__.py	__init__.py	adding an __init__.py file to help with testing	May 13, 2013
requirements.txt	requirements.txt	added argparse to requirements	May 14, 2013
setup.py	setup.py	adding additional dollar sign characters for cashtags	Oct 3, 2013
tests.py	tests.py	skip wide character validation tests automatically on narrow builds	May 16, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Changes in 2.0

Usage

TwitterText(text)

Properties:

Autolink(text)

Defaults

Methods:

Extractor

Methods

HitHighlighter

Defaults

Methods

Validation

Methods

About

Releases

Packages

Used by 106

Contributors 6

Languages

License

dryan/twitter-text-py

Folders and files

Latest commit

History

Repository files navigation

Changes in 2.0

Usage

TwitterText(text)

Properties:

Autolink(text)

Defaults

Methods:

Extractor

Methods

HitHighlighter

Defaults

Methods

Validation

Methods

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Used by 106

Contributors 6

Languages

Packages