canonicalize

Name

canonicalize - Canonicalize Internet Telephony URIs

Synopsis

canonicalize [-a] [-n] [-d] [-m] [-A aliasfile] [-D dial plan file] [-T database uri ]

Availability

Canonicalize is a utility which is part of CINEMA, the Columbia INternet Extensible Multimedia Architecture. It is included in the sipd distribution. It converts SIP and tel URLs to a canonical form that is then used by sipd to look up registration and other information. URLs with other schemes are passed through unchanged.

Description

canonicalize converts SIP and tel URLs into a canonicalized form by applying one of four transformations:

Database:

Canonicalize based on the CINEMA database file's aliases table. This transformation is attempted for both sip: and tel: URLs. If there is an entry in the aliases table with an alias field equal to user@host, it returns a URL corresponding to the corresponding primary_user field.
Aliases:

Canonicalize based on the system e-mail aliases file. This transformation is only attempted for sip: URLs. If the alias contains a host name, i.e., is of the form user@host, it replaces the host name in the URL.

Name mapping:

Convert full names to user names, based on the system password database. This transformation is only attempted for sip: URLs and only changes the user component of the URL.

Dialplan:

Convert telephone numbers, based on a dial plan, to appropriate user or gateway URIs. This transformation is applied only to tel: URLs, sip: URLs with the user=phone parameter and sip: URLs that "look like" phone numbers, i.e., contain only digits, hyphens and possibly a leading plus sign.

Each transformation can be enabled individually. The transformations are applied in the order given above. If any transformation matches, i.e., causes the URI to change, no further transformations are applied.

Options

-T database-uri

Perform database lookups in the aliases table to transform alias to primary_user. Database tables are specified using pseudo-URIs, inthe same way as they are for sipd. The only currently supported database URI scheme is sql, representing MySQL tables.
-a

Perform aliases transformations on the user field of sip URIs. Search in an alias file and (on Unix systems) the mail.aliases NIS map for any matching aliases.

-A aliasfile

Name of alias file; default /etc/aliases. The format of this file is described in aliases(4). Lines beginning with white space are treated as continuation lines; lines beginning with # are comments. Aliases may recurse. This option is ignored unless the -a option is also specified.

-n
If this flag is present, canonicalize will perform name mapping transformations on the user field of sip URIs by searching the GECOS (real name) field of the system password database for matching names. Names match if the user field matches the user name (first field) or one or more of the components of the real name field. (One component must match exactly; others may be prefixes.) The components can be separated by periods or underscore characters. For example, the password entry
  jqd:*:21533:21533:John Quincy Doe,4F582:/home/jqd:/bin/sh
matches the following names in the user part of the URL: jqd, John, Quincy, Doe, J.Q.Doe, Quincy.Doe, Quin.Doe, John.Q.Doe, John_Doe, JohnDoe, JDoe, JohnQuincyDoe, JohnQDoe.
This transformation is currently only applicable on Unix systems.
-d

If this flag is listed, canonicalize performs dial plan transformations on URIs that represent telephone numbers: tel URIs, and sip URIs with the user=phone parameter. Search in a dial plan file, specified with the -D flag, for the transformation patterns.

-D dial plan file

Name of dial plan mapping file; default dialplan in the current directory. The file consists of lines separated by LF or CRLF. Each line has two strings, separated by white space. The first is a pattern to match; the second is a replacement string.

There are two types of patterns allowed: globs, and regular expressions. Glob patterns consist of literal characters and the special characters *, ?, (), {}, and [].

? Matches any single character.

* Matches any sequence of zero or more characters.

[chars] Matches any single character in chars. If chars contains a sequence of the form a-b then any character between a and b (inclusive) will match. If the first character of chars is ^, the sense of the test is negated (i.e. it matches all characters not listed in the range).

\x Matches the literal character x.

{pat1,pat2,...} Matches any of the patterns pat1, pat2, etc.

^pat Negates the sense of the match - the pattern matches if pat does not match the pattern. The caret must be the first character of the pattern.

(pat1)pat2 Does not affect matching, but causes the parenthesized part of the string not to be substituted into the match. This is only significant at the beginning of the pattern. It may not be used in negated matches.

Alternately, if the pattern is enclosed in slashes, as /pat/, it is matched as a POSIX extended regular expression rather than a glob. Please see the POSIX standards documenting regular expressions. The pattern may optionally be preceded with an exclamation point (!) to negate the sense of the match.

Entries in the right column are the resulting canonicalized URIs to resolve. If the pattern is a glob, these URIs may contain the unquoted character $. If present, it is replaced by the user part of SIP URIs or the whole number part of tel URLs, excluding any part which the pattern matched with its parenthesized prefix. Multiple $ signs will all be replaced by this string. (To include a literal $ in the canonicalized URI, encode it using URI-quoting, as %24.)

For regular expressions, the substituted patterns are $0 for the entire string matched by the regular expression, and $1 through $9 for the first through ninth parenthesized subexpression.

Comments, indicated by # in the first column, are allowed, as are blank lines.

This option has no effect unless the -d option is specified.
If none of the options -a, -n, or -d is given, canonicalize will only perform the trivial "identity" transformation, i.e., return the original URL unmodified.

Interface

On startup, canonicalize either writes :OK to its standard output, after its initialization code has successfully completed, or it writes :ERROR text and exits if there has been an error.

Once it is running, canonicalize reads newline-separated URIs from its standard input, and returns canonicalized responses over its standard output.

Upon reading a URI, canonicalize will either respond with a URI, or with an error message.

A standard response will contain a URI, on a line by itself, that canonicalize has determined to be the canonical resolution of this URI. If no canonicalization could be performed on the URI, the URI will be returned verbatim (possibly with some non-semantically-significant formatting transformations).

Error messages begin with a leading colon (:). Due to the syntax of URIs, no URI can begin with a colon. There are two error messages defined: :ERROR text, and :MULTIPLE n.

The error message :ERROR indicates that canonicalize could not understand the URI it was given, or that some internal error has occured, such as failure to allocate memory or open files. The error message may be followed by UTF-8 text giving a human-readable description of the error. URL that are not SIP or tel URLs are not errors, but are rather just copied from stdin to stdout.

The error message :MULTIPLE indicates that canonicalize could not determine a unique resolution of the passed-in URI, and several possible matches were found. The decimal integer following :MULTIPLE gives the number of matches. That many possible matches will follow the :MULTIPLE error. The matches will be formatted as :+name <addr> (i.e., :+ followed by a SIP name-addr). Note that :MULTIPLE 0 is legal if canonicalize cannot or does not wish to list the multiple addresses found.

Run-Time settings

canonicalize depends on libcanon a dynamically-loadable shared library (libcanon.so in Unix, libcanon.dll in Windows), which implements the core canonicalization functionality. The LD_LIBRARY_PATH (Unix) and PATH (Windows) environment variables should be set to include the directory containing libcanon. In both binary and source distributions, libcanon is in the same directory as canonicalize

Author

Jonathan Lennox and Henning Schulzrinne, Columbia University

Copyright

Copyright 1997-2001 by Columbia University; all rights reserved

Permission to use, copy, modify, and distribute this software and its documentation for research and educational purpose and without fee is hereby granted, provided that the above copyright notice appear in all copies and that both that the copyright notice and warranty disclaimer appear in supporting documentation, and that the names of the copyright holders or any of their entities not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission. Use of this software in whole or in parts for direct commercial advantage requires explicit prior permission.

The copyright holders disclaim all warranties with regard to this software, including all implied warranties of merchantability and fitness. In no event shall the copyright holders be liable for any special, indirect or consequential damages or any damages whatsoever resulting from loss of use, data or profits, whether in an action of contract, negligence or other tortuous action, arising out of or in connection with the use or performance of this software.

Last updated by Jonathan Lennox

?	Matches any single character.
*	Matches any sequence of zero or more characters.
[chars]	Matches any single character in chars. If chars contains a sequence of the form a-b then any character between a and b (inclusive) will match. If the first character of chars is `^`, the sense of the test is negated (i.e. it matches all characters not listed in the range).
\x	Matches the literal character x.
{pat1,pat2,...}	Matches any of the patterns pat1, pat2, etc.
^pat	Negates the sense of the match - the pattern matches if pat does not match the pattern. The caret must be the first character of the pattern.
(pat1)pat2	Does not affect matching, but causes the parenthesized part of the string not to be substituted into the match. This is only significant at the beginning of the pattern. It may not be used in negated matches.

Name

Synopsis

Availability

Description

Options

Interface

Run-Time settings

See Also

Author

Copyright