Saturday, December 29, 2012

Datastage Universe Quick Refrence

================================================================
VOC
================================================================
VOC is uniVerse system table.
VOC contains all file pointers, commands (called verbs) and misc stuff.
&PH& (phantom history)
&SAVEDLISTS& (save lists history)
&ED& (edit macros)
&COMO& (COMO history)

================================================================
TCL History
================================================================
.L (list history)
.? (help on history commands)
.X3 (execute the 3rd command in stack)
.D3 (delete 3rd command in stack)
.R3 (recall the 3rd command to the top of the stack)
.C/DS/DSR/ (change DS to DSR)
.S DS.3 (save command to VOC as a sentence)
ED VOC DS.3 (edit the VOC record DS.3)

================================================================
ED commands
================================================================
EX (exit)
2 (goto line 2)
HELP R (help on commands which start with R)
B (goto to bottom or last line)
T (goto to top)
P (page or list the next 20 lines)
PP (center the current line on the screen)
L DS (locate the word DS)
D3 (deletes next 3 lines)
DUP (duplicates line)
<> (marks block as current line)
COPY (copies block to current location)
I (insert lines until you insert a blank line)
C/DS/DSR/ (change DS to DSR on the current line)
R/DS/DSR/ (does same as above)
R DS (replaces whole line with DS)
C (does last change again)
C13/DS/DSR/ (change DS to DSR on the next 13 lines)
C13,DS,DSR, (does same as above)
C999999/DSR/smoe/G (changes DSR to smoe on all lines)
(the G says globally on a line)
(meaning multiple occurances on a line)
FI (files or writes to disk and exits)
FI DS (files this record as DS)
LOAD DSR.BP READ.INV (loads record READ.INV from file DSR.BP
into this record. Will prompt for line numbers)

================================================================
Dictionaries
================================================================
Dictionaries describe data.
Each uniVerse file has a dictionary side and a data side.
The data side is the default for most commands.

LIST DICT KD_MD_COLUMNS (list dictionary side of hash file KD_MD_COLUMNS)

ED DICT KD_MD_COLUMNS @ID (edit dictionary item @ID)

1 -> D (says this is a D type dictionary)
2 -> 0 (@ID is always field 0)
3 -> (oconv)
4 -> KD_MD_COLUMNS (column heading)
5 -> 10L (length plus justification)
6 -> S (single or multivalued)
(@ID is always single valued)

ED DICT KD_MD_COLUMNS DSN (edit dictionary item DSN)

1 -> I (says this is a I-Descriptor)
2 -> FIELD(@ID, @TM, 2) (part 2 of a compound key)
3 -> (oconv)
4 -> DSN (column heading)
5 -> 15L (length plus justification)
6 -> S (single or multivalued)

@ID (the default key on a file)
@KEY (a phrase listing all key fields)
@ (a phrase listing fields in default listing)
(add @KEY fields and ID.SUP if @KEY exists)

@INSERT (default fields for INSERT command)
@REVISE (used by the REVISE command)

CD KD_MD_COLUMNS (compile DICT if you have I-Descriptors)

================================================================
SELECT lists
================================================================
SELECT KD_MD_SYSTEM (creates an active list of IDs)
LIST KD_MD_SYSTEM (uses active select list)

SSELECT KD_MD_SYSTEM (creates sorted lists of IDs)
SAVE.LIST DS (saves list of IDs in &SAVEDLISTS&)

EDIT.LIST DS (allows you to change list of IDs)
ED &SAVEDLISTS& DS (same as above)

DELETE.LIST DS (deletes saved list)

GET.LIST DS (activates a saved list)
LIST KD_MD_SYSTEM (uses active save list)

================================================================
UV BASIC
================================================================
CREATE.FILE DS.BP 19 (do once. Type 19 is a UNIX dir with LONGNAMES)
ED DS.BP MY.BASIC.PROG (MY.BASIC.PROG is a record within DS.BP)
BASIC DS.BP MY.BASIC.PROG (to compile)
RUN DS.BP MY.BASIC.PROD (to run program)

also
UV.VI DS.BP MY.BASIC.PROG (to run UNIX vi on your program)

HELP BASIC OCONV (help on basic function OCONV)

================================================================
COMO files
================================================================
COMO ON DS (creates record named DS in &COMO&)

--- do some commands here ---
LIST VOC
.L

COMO OFF (ALWAYS turn como off)

ED &COMO& DS (all commands and output in here)

================================================================
Examples
================================================================
CREATE.FILE MYBP 19

ED MYBP PRINT.WHO
print @WHO
end

BASIC MYBP PRINT.WHO
RUN MYBP PRINT.WHO

================================================================

ED DICT MYBP LINES

1 -> I (says this is a I-Descriptor)
2 -> DCOUNT(@RECORD, @FM) (counts number of lines in a program)
3 -> (oconv)
4 -> Lines (column heading)
5 -> 5R, (length plus justification)
6 -> S (single or multivalued)

LIST MYBP TOTAL LINES

ED DICT MYBP SIZE

1 -> I (says this is a I-Descriptor)
2 -> len(@RECORD) (counts number of bytes in a program)
3 -> (oconv)
4 -> Bytes (column heading)
5 -> 5R, (length plus justification)
6 -> S (single or multivalued)

LIST MYBP TOTAL SIZE

================================================================

CREATE.FILE MY.INVOICE
Type=18
Modulo=11
Separation=2

ED MYBP INVOICE.ADD
================================================================

open 'MY.INVOICE' to INVOICE else stop

cust.id='BELLSOUTH'
prod.id1='111'
qty1=2
price1=10.00
cost1=8.00

inv.rec=''
inv.rec<2>=cust.id
inv.rec<6>=@date

line.item=1
inv.rec<1, line.item>=prod.id1
inv.rec<3, line.item>=qty1
inv.rec<4, line.item>=iconv(price1, 'MD2')
inv.rec<5, line.item>=iconv(cost1, 'MD2')

prod.id2='222'
qty2=4
price2=7.87
cost2=3.33

* better way
prod.attr=1
qty.attr=3
price.attr=4
cost.attr=5

line.item=2
inv.rec<prod.attr, line.item>=prod.id2
inv.rec<qty.attr, line.item>=qty2
inv.rec<price.attr, line.item>=iconv(price2, 'MD2')
inv.rec<cost.attr, line.item>=iconv(cost2, 'MD2')

prod.id3 ='223'
qty3 =5
price3 =1.21
cost3 =3.22

line.item=3
inv.rec<prod.attr, line.item>=prod.id3
inv.rec<qty.attr, line.item>=qty3
inv.rec<price.attr, line.item>=iconv(price3, 'MD2')
inv.rec<cost.attr, line.item>=iconv(cost3, 'MD2')

write inv.rec on INVOICE, '5'

end

BASIC MYBP INVOICE.ADD
================================================================

RUN MYBP INVOICE.ADD
================================================================

LIST MY.INVOICE
================================================================

ED DICT MY.INVOICE PROD.ID
================================================================

1 -> D (says this is a D type dictionary)
2 -> 1 (field number)
3 -> (oconv)
4 -> PRODUCT (column heading)
5 -> 10L (length plus justification)
6 -> M (single or multivalued)
7 -> ProdMv (association)

ED DICT MY.INVOICE CUST.ID
================================================================

1 -> D (says this is a D type dictionary)
2 -> 2 (field number)
3 -> (oconv)
4 -> CUSTOMER (column heading)
5 -> 10L (length plus justification)
6 -> S (single or multivalued)

ED DICT MY.INVOICE QTY
================================================================

1 -> D (says this is a D type dictionary)
2 -> 3 (field number)
3 -> (oconv)
4 -> Qty (column heading)
5 -> 5R (length plus justification)
6 -> M (single or multivalued)
7 -> ProdMv (association)

ED DICT MY.INVOICE PRICE
================================================================

1 -> D (says this is a D type dictionary)
2 -> 4 (field number)
3 -> MD2 (oconv)
4 -> PRICE (column heading)
5 -> 12R (length plus justification)
6 -> M (single or multivalued)
7 -> ProdMv (association)

ED DICT MY.INVOICE COST
================================================================

1 -> D (says this is a D type dictionary)
2 -> 5 (field number)
3 -> MD2 (oconv)
4 -> COST (column heading)
5 -> 12R (length plus justification)
6 -> M (single or multivalued)
7 -> ProdMv (association)

ED DICT MY.INVOICE SOLD
================================================================

1 -> D (says this is a D type dictionary)
2 -> 6 (field number)
3 -> D4- (oconv for date field)
4 -> SOLD (column heading)
5 -> 12R (dates right justified)
6 -> S (single or multivalued)
7 -> (no association on singlevalued fields)

================================================================
locating multivalued data by ascending left justified
field 1 is sorted
fields 2 thru 6 are controlled by field 1
when field 1 gets a new value then so does fields 2 thru 6
================================================================

locate new.param in params<1> by 'AL' setting mv.no then
if new.default <> params<2, mv.no> and params<2, mv.no> <> '' then
params<3, mv.no>=new.default ; * last default.value
params<4, mv.no>='Y' ; * new.default does change
end else
params<2, mv.no>=new.default ; * if null overwrite
end
params<6, mv.no>=id ; last job to use this parameter
end else
* ================================================================
* controling field is <1>
* ================================================================
ins new.param before params<1, mv.no>
* ================================================================
* dependant fields 2 thru 6
* ================================================================
ins new.default before params<2, mv.no> ; * first default.value
ins new.default before params<3, mv.no>
ins 'N' before params<4, mv.no>
ins id before params<5, mv.no> ; * first job to use this parameter
ins id before params<6, mv.no>
end

================================================================
locating multivalued data by scanning each value
field 1 cantains a string
================================================================
string.search='Src'
no.params=dcount(params<1>, @VM)
found=@false
for i=1 to no.params until found
param=params<1,i>
if index(param, search.string, 1) >0 then
found=@true
end
next i
if found then
do something
end
================================================================
UVODBC
================================================================
Controlled by 3 files

1. $UVHOME/.odbc.ini
2. $UVACCOUNT/uvodbc.uvconfig
3. $INFORMIXDIR/etc/sqlfiles

You should not have to shutdown DataStage and restart it.
Try DS_CONNECT to test it or view data within a job.
================================================================

UVHOME=`cat /.uvhome/` ; # works on any UNIX box
UVACCOUNT="/bto/appl/uv/dsproj" ; # just an example
INFORMIXDIR="/bto/appl/informix" ; # just an example
================================================================

cd $UVHOME
vi .odbc.ini
================================================================
[ODBC Data Sources]
Oracle7=
Informix=
ORSAFD1=
ORSTAGD1=
ORSTAGP1=
ORNODSD1=
ORNODSP1=
ORSDDSP1=
ORDEVTST=
ORSAFT1=
BARNEYCCCODBC=
ORSAFUAT=
icsdm=

[Oracle7] Driver=/bto/appl/uv/../branded_odbc/lib/VMor713.so
Description=Oracle7
ServerName=oraclehost
LogonID=odbc01
Password=odbc01

[ORSAFD1]
Driver=/bto/appl/uv/../branded_odbc/lib/VMor815.so
Description=Oracle8
ServerName=ORSAFD1

[SNAPSHOTS]
QEWSD=36952
Driver=/bto/appl/uv/../branded_odbc/lib/VMinf15.so
database=snapshots
hostname=barney.bst.bls.com
Service=acac
Servername=icsdm
Description=Informix ODBC connection to Barney's snapshots database

[SNAPALT]
QEWSD=36952
Driver=/bto/appl/uv/../branded_odbc/lib/VMinf15.so
database=snapshots
hostname=90.14.152.43
Service=acac
Servername=icssnap
Description=Informix ODBC connection to E3500's snapshots database

================================================================
dev .odbc.ini looks like this
================================================================
[SNAPSHOTS]
QEWSD=36952
Driver=/bto/appl/uv/../branded_odbc/lib/VMinf15.so
database=snapshots
hostname=barney.bst.bls.com
Service=acac
Servername=icsdm
Description=Informix ODBC connection to Barney's snapshots database

[SNAPALT]
QEWSD=36952
Driver=/bto/appl/uv/../branded_odbc/lib/VMinf15.so
database=snapshots
hostname=90.14.152.43
Service=acac
Servername=icssnap
Description=Informix ODBC connection to E3500's snapshots database

================================================================

cd $UVACCOUNT
vi uvodbc.config
================================================================
[ODBC DATA SOURCES]
<localuv>
DBMSTYPE = UNIVERSE
network = TCP/IP
service = uvserver
host = localhost

<ORSAFD1>
DBMSTYPE = ODBC

<SNAPSHOTS>
DBMSTYPE = ODBC

<SNAPALT>
DBMSTYPE = ODBC
================================================================

cd $INFORMIXDIR/etc
vi sqlhosts
================================================================
icsdm ontlitcp 90.14.152.9 acac
# The following points to the "alternate BARNEY":
icssnap ontlitcp 90.14.152.43 acac
================================================================
DataStage exports and importing
================================================================
Need to Compile afterwards both jobs and routines.
================================================================

================================================================
SQL Against UV files
================================================================
Use ' not ".
Do not ' on numbers.
Use ; at the end of each line.
HELP SQL. Will show you syntax.
Multivalues are very difficult in SQL to update.
Store your sql programs in your BP.
Create remote pointers to run them.
SELECT * will not work without @SELECT.
INSERT may not work without @REVISE.

UV.VI KDBP COUNT.PROD.SQL
================================================================
PA
SELECT
PROD_ID,
COUNT(*)
FROM
HaNodsSoProdIDs.DF
WHERE
COMPLETION_DATE BETWEEN 20010401 AND 20010430
GROUP BY
PROD_ID
ORDER BY
PROD_ID
;

ED VOC SQL.COUNT.PROD
================================================================
R
KDBP
COUNT.PROD.SQL

SQL.COUNT.PROD (to run program)
================================================================
PROD_ID COUNT ( * )

-99 182714
-98 559953
-1 185902
1 1496174
2 47
3 2113
4 635
5 16632
6 219
10 15823
31 61
32 29189
36 1067180
37 79778
40 51007
41 223
42 17877
43 2224
48 1527
49 3
51 129
52 1329
79 253
85 3
86 870
91 14
92 12
100 1446
101 36
102 250
110 4
112 1
141 605
142 96
155 51506
156 2373
157 455
999 6316

38 records listed.
>

================================================================
Creating a hash file out of ODBC connection
This creates a Hash file called DS1 from STREAM_DEF
================================================================

PA
DELETE.FILE DS1
CONNECT ORSILOD1
DATA user
DATA password
DATA .U DS1
DATA SELECT * FROM STREAM_DEF WHERE BATCH_NUM LIKE '86%' AND BATCH_OPER = 'DSRUN';
DATA N
DATA .Q

================================================================
program to create distributed hash file
================================================================
prog.id='CREATE.MULTI.PART.FILE'
usage=prog.id:' filename no.of.parts dictname'
prompt=''
true=1
false=0
get(arg.)filename else
print usage
stop
end
get(arg.)no.of.parts else
print usage
stop
end
get(arg.)dictname else
print usage
stop
end
get(arg.)run.prog else run.prog=false
if not(num(no.of.parts)) then
print usage
stop
end
if no.of.parts<1 or no.of.parts>10 then
print 'No.of.parts too big or too small'
stop
end
open 'VOC' to VOC else stop
open 'DICT', filename to MASTER.FILE else
print filename:' does not exist'
end
read master.rec from VOC, filename else stop
if master.rec<1>[1,1] <> 'F' then
print filename:' not a F-Pointer'
stop
end
read dict.rec from MASTER.FILE, dictname else
print 'No ':dictname:' in DICT, ':filename
stop
end
df.file=filename:".DF"
ecmd='DEFINE.DF ':df.file:' INTERNAL ALGORITHMS ':dictname
print ecmd
if run.prog then execute ecmd
for i=1 to no.of.parts
new.file=df.file:'.':i
ecmd='CREATE.FILE DATA ':new.file:' DYNAMIC'
print ecmd
if run.prog then execute ecmd
read voc.rec from VOC, new.file then
voc.rec<3>=master.rec<3>
if run.prog then write voc.rec on VOC, new.file
end else
print new.file:' not created'
if run.prog then stop
end
ecmd='DEFINE.DF ':df.file:' ADDING ':new.file:' ':i:' INTERNAL ALGORITHMS ':dictname
print ecmd
if run.prog then execute ecmd
next i
end

================================================================

Thursday, December 27, 2012

Conformed Dimensions with example

In data warehousing, a conformed dimension is a dimension that has the same meaning to every fact table in the structure. Conformed dimensions allow facts and its measures to be accessed in the same way across multiple facts, ensuring consistent reporting across the enterprise.

A conformed dimension can exist as a single dimension table that relates to multiple fact tables within the same data warehouse, or as identical dimension tables in separate data marts.

As you can see in the above figure,the time and cust dimensions are called confirmed dimensions as they are shared across multiple fact tables with the same meaning.

BusinessObjects - list of values (LOV)

What’s a list of values (LOV)

A list of values contains the data values associated with an object. A LOV can have two origins : a database or an external file such as a text or an Excel File.

LOV from Database.

By default, when you drag and drop a table column to create an object, Business Objects will create an object with an associated LOV. The LOV's name will be a generic one until you change it. The “Restore Default” button will give it back its generic name.

A LOV is generated by a simple SQL statement like “SELECT DISTINCT TABLE.COLUMN FROM TABLE”. But you can modify the statement by adding some other objects through the Query Panel, you can also add some conditions on these objects.

You can’t associate a LOV coming from another database because the LOV is linked through the connection - you can use a call to a DBLINK however.

You can apply an ascending or a descending sort on the LOV, always through the query panel.

Particular example: How to sort the list of values with a number and not with a label when you want the label to be at the first column of your LOV?

If you have the number of month and the 12 months' names as labels, you want your list of values to be sorted as follows:

January 1

February 2

March 3

.......

So you create your list of values as follows :

Month + number of month, but it now appears like this :

April 4

August 8

........

The SQL of the query is: SELECT DISTINCT MONTH.MONTH, MONTH.NUMBER FROM MONTH

Solution: Edit the SQL, Add ORDER BY NUMBER, MONTH and check the option Do not generate SQL before execution, it will solve your problem. You can also apply a sort on number.

LOV from a flat file

You can also assign personal data to a list of values. This is very useful if you want to allow users to select values among the values stored in an Excel file for example, define personal LOV on product based on Myproduct.xls.

To do this, in Designer module, you have to:

- Select the 'Lists of Values' command from the Tools menu

- Select the object whose list of values you wish to edit

- Click “Personal Data Button”

- Select a file and a format and run it.

If your external file is sorted in a particular way, note that Business Objects will automatically modify the sort and apply an ascending sort.

Note that you can’t have any dynamic link with the external file, it means that you have to refresh the LOV if your external file has changed.

You can allow 3 things on LOV based on the database :

- Allow users to edit the LOV

- Automatic refresh before use

Export with universe (before being exported, a lov must be generated by a display or an edit)

Wednesday, December 26, 2012

What is ETL Mapping Document ?A Real Time Example

One of the regular viewer of this blog requested me to explain the important's of the ETL mapping document.

What is ETL Mapping Document :
The ETL mapping document contains the source,target and business rules information's, this document will be the most important document for the ETL developer to design and develop the ETL jobs.

A typical mapping document should contain the following information's

1) Mapping indicator(Values A:ADD, D:Delete,C:Change)
2) Change description (Used to indicate mapping changes)
3) Key Indicator( Indicates whether the field is Primary key or not)
4) Source Table/File Name
5) Source Field Name
6) Source Field Data Type
7) Source Field Length
8) Source Field Description(The description will be used as a meta data for end user)
9)Business Rule
10)Target Table Name
11) Target Field Name
12) Target Data Type
13) Target Field Length
14) Comment

You can add more fields based on your requirement.

Real Time Example:
Let's take a look at a real time example here, consider that we are getting a source file called "Employee_info" which contains employee information's and it should get loaded into the EMP_DIM table in target.

One of the business requirement is to convert Sales -->S,Packing-->P and Transport-->T

Row id	Emp First Name	Emp Last Name	EMP No	EMP Dep
1	John	Beck	2001	Sales
2	Mike	Morry	2002	Sales
3	Kevin	Peter	2003	Packing
4	Steve	Morry	2004	Packing
5	Jim	Chen	2005	Transport

Here is how the mapping will look like

Once the mapping is prepared, it needs to be reviewed with end user and get the sign off from the them.

Lets say that during the review they users want to add "E" before the EMP_NUM field in target.
The mapping document should be changed first, here it's how it will look after the change

As you can see in the image that the "Mapping Indicator" column got changed from A to C and "Change description" contains change details.

Types Database Schemas

I am going discusses two types of schema here, Start Schema and Snowflake Schema

Start Schema

A single Fact table surrendered by multiple Dimension table is called Start Schema. It doesn’t capture hierarchy directly

Benefits of Start Schema

· Performance (Integer relationships, natural partitioning, Single joins benefit SQL optimizer)

· Source system independence and multiple integration

· Supports Change management

· Usability/Simplicity (easy to read, interpret, join, calculate)

· Presentation (Consistency, Taxonomy, Labeling)

· Reuse (Conformed dimensions reduce redundancy, Role-plays)

Snowflake Schema

It represents the dimension hierarchy directly by normalizing tables

Types Database Schemas

I am going discusses two types of schema here, Start Schema and Snowflake Schema

Start Schema

A single Fact table surrendered by multiple Dimension table is called Start Schema. It doesn’t capture hierarchy directly

Benefits of Start Schema

· Performance (Integer relationships, natural partitioning, Single joins benefit SQL optimizer)

· Source system independence and multiple integration

· Supports Change management

· Usability/Simplicity (easy to read, interpret, join, calculate)

· Presentation (Consistency, Taxonomy, Labeling)

· Reuse (Conformed dimensions reduce redundancy, Role-plays)

Snowflake Schema

It represents the dimension hierarchy directly by normalizing tables

Data Warehousing Objects

Fact tables and dimension tables are the two types of objects commonly used in dimensional data warehouse schemas.

Fact tables are the large tables in your warehouse schema that store business measurements. Fact tables typically contain facts and foreign keys to the dimension tables. Fact tables represent data, usually numeric and additive, that can be analyzed and examined. Examples include sales, cost, and profit.

Dimension tables, also known as lookup or reference tables, contain the relatively static data in the warehouse. Dimension tables store the information you normally use to contain queries. Dimension tables are usually textual and descriptive and you can use them as the row headers of the result set. Examples are customers or products.

Fact Tables

A fact table typically has two types of columns: those that contain numeric facts (often called measurements), and those that are foreign keys to dimension tables. A fact table contains either detail-level facts or facts that have been aggregated. Fact tables that contain aggregated facts are often called summary tables. A fact table usually contains facts with the same level of aggregation. Though most facts are additive, they can also be semi-additive or non-additive. Additive facts can be aggregated by simple arithmetical addition. A common example of this is sales. Non-additive facts cannot be added at all. An example of this is averages. Semi-additive facts can be aggregated along some of the dimensions and not along others. An example of this is inventory levels, where you cannot tell what a level means simply by looking at it.

Dimension Tables

A dimension is a structure, often composed of one or more hierarchies, that categorizes data. Dimensional attributes help to describe the dimensional value. They are normally descriptive, textual values. Several distinct dimensions, combined with facts, enable you to answer business questions. Commonly used dimensions are customers, products, and time.

Dimension data is typically collected at the lowest level of detail and then aggregated into higher level totals that are more useful for analysis. These natural rollups or aggregations within a dimension table are called hierarchies.

Data Warehousing Objects

Fact tables and dimension tables are the two types of objects commonly used in dimensional data warehouse schemas.

Fact Tables

Dimension Tables

Unix Vi Quick Reference

vi Notes
----------------------------------------------------------------------
Modes:
vi
command
edit (ESC to goto command mode)
ex - line editor
: prompt
:vi (to goto visual mode)
----------------------------------------------------------------------
vi Commands:
a - append after cursor
A - append after end of line
c - change needs scope (ex: 3cw would change the next 3 small words)
cc - change line
C - change from cursor to end of line
d - delete scope (ex. 5dW would delete next 5 big words)
dd - deletes line
D - deletes to end of line
i - insert after cursor
I - insert at begining of line (not counting whitespace)
o - opens new line after cursor
O - opens new line before cursor
p - "put" or paste after cursor from cut & paste buffer
"3p - paste from 3 edit
P - pastes before cursor
r - replace current char
R - replace till ESC pressed
s - substitute till ESC pressed
u - undo
U - undo all changes on this line
x - delete 1 char
yy - "yank yank" or copy current line into cut & paste buffer
ZZ - save and quit
. - repeat last change
>> - indent 1 shiftwidth
<< - outdent 1 shiftwidth
^V - in edit mode will insert control codes
/regexpression - finds regular expression
?regexpression - finds regular expression towards top of file
----------------------------------------------------------------------
Regular expressions: (wild cards)
^ - begining of a line not counting whitespace
0 - goto first char on line
$ - end of a line
\ - gets rid of magic
\$ - matches a dollar sign
. - matches any char
[4-7] - matches numbers 4 thru 7
[A-Z] - matches A thru Z
[j-r] - matches j thru r
* - zero or more of preceding expression
^[A-Z][0-9][0-9]* - would match a cap letter 1 number at begining of line
----------------------------------------------------------------------

----------------------------------------------------------------------
vi Movement:
^ - begining of line
$ - end of line
b - back 1 small word
B - back 1 big word
e - end of small word
E - end of big word
G - goto end of file
2G - goes to line 2
h - moves to left
j - moves down 1 line
l - moves 1 char to right
k - moves up 1 line
w - move 1 small word
W - move 1 big word
----------------------------------------------------------------------
vi Scope:
w,W,b,B, etc
----------------------------------------------------------------------
vi Notes:
most commands can be preceeded by a number for multiple occurances
(example 3dd - would delete the next 3 lines)
small word is defined by whitespace
whitespace is tabs, spaces.
big word is defined by any puncuation (){}[],.!?\|:;'"<>-_@#$%^&*
----------------------------------------------------------------------

----------------------------------------------------------------------
ex commands:
g - global finds regular expressions
map - map control codes or function keys to macros
q - quit
r - reads in file
s - substitute
so - execute macro file
set - set options
w - write
----------------------------------------------------------------------
ex Examples:
:7 - goes to line 7
:4,6d - deletes lines 4 thru 6
:r file - reads file into current doc after current line
:r !command - reads results of UNIX command into doc
:r !date - would read in date command
:4,9!command - pipes lines 4 thru 9 to UNIX command
:%!sort - would sort all lines
:3,8!sort - would sort lines 3 thru 8
:%g/^print/d - would delete all lines that start with print
:%s/ouput/output/g - would change ouput to output on all lines
:set nows - nowrapscan set searches not wrap around to begining of file
:set ic - ignore case on searches
:set nonu - no line numbers
:set all - to see all the options
:map #1 1G - map function key 1 to goto line 1
:map #2 G - map function key 2 to goto last line
:map #3 :set ts=40^M - map f3 to set tabstops to 40 chars
:map #4 :set sw=3^M - map f4 to set shiftwidth to 3 chars
:map #5 :set ai^M - map f5 to autoindent
:map #6 :%s/^I/ /g^M - map f6 to change control-i to 3 spaces
:map ^O :%s/display/print/g^M - map ^O to change display to print
----------------------------------------------------------------------
Saving and Quiting:
:q! - force a quit even on a modified file
:wq - write and quit
:wq! - force a write then quit (can not do if permissions wrong)
:w newfilename
:q - quit
----------------------------------------------------------------------
ex Scope:
. - current line
$ - last line
5,7 - lines 5 thru 7
1,$ - all lines
% - all lines
.,$ - current line to end of file
----------------------------------------------------------------------
ex Notes:
most commands require scope
scope are the lines to issue command on
----------------------------------------------------------------------

Unix Vi Quick Reference

PERL Quick References

PERL (Practical Extration and Report Language) Quick References

Variables:
FIZZLE A filehandle or directory handle.
$FIZZLE A scalar variable.
@FIZZLE An array indexed by number.
%FIZZLE An array indexed by string.
&FIZZLE Subroutine
*FIZZLE Everything named FIZZLE.

Special Variables:
<STDIN>,<STDOUT>,<STDERR>
$a = <STDIN> sets $a to the next line of standard input.
@a = <STDIN> sets @a to all the rest of the input lines.
<ARGV> Filehandles supply from command line.
@ARGV Lists of filenames from command line.
$ARGV Hold the name of the current filename.
$#ARGV Hold the last element of the array $ARGV. (base is 0)
$_ Hold the current line.
$& Hold the text of what you matched.
$` Hold everything before the match.
$' Hold everything after the match.
$@ *****Hold everything after the match.********
$[ Hold the current array base, ordinarily 0.

Operator Precedence: Lowest to highest
Associativity Operators
nonassoc The list operators (eg. print,sort,chmod)
left ,
right =,+=,-=,*=, etc.
right ?:
nonassoc ..
left ||
left &&
left |^
left &
nonassoc ==,!=,<=>,eq,ne,cmp
nonassoc <>,<=,>=,lt,gt,le,ge
nonassoc The named unary operators (eg. chdir)
nonassoc -r,-w,-x, etc.
left <<,>>
left +,-,.
left *,/,%,x
left =~,!~
right ~,~,and unary minus
nonassoc ++,--
left '('

Scalar Operators:
Pattern Matching
!~ Not Match
(i.e. $a !~ /pat/ "true if $a not contains pattern").
=~ Match
(i.e. $a =~ /pat/ "true if $a contains pattern").
=~ Substitution
(i.e. $a =~ s/p/r/ "replace occurrences of p with r in $a").
=~ Translation
(i.e. $a =~ tr/a-z/A-Z/).

Logical Operators
$a && $b Add (true if $a is true and $b is true).
$a || $b Or ($a if $a is true otherwise $b).
! $a Not (true if $a is not true).

Arithmetic Operators
$a + $b Add
$a - $b Subtract
$a * $b Multiply
$a / $b Divide
$a % $b Modulus
$a ** $b Exponentiate
++$a,$a++ Autoincrement
--$a,$a-- Autodecrement
rand($a) Random

String Operations
$a . $b Concatenation
$a * $b Repeat (value of $a strung together $b times)
substr($a,$o,$l) Substring (Substring at offset $o of length $l)
index($a,$b) Index (Offset of string $b in string $a)

Assignment Operations
$a = $b Assign $a gets the value of $b
$a += $b Add to Increase $a by $b
$a -= $b Substract from Decrease $a by $b

Test Operations
Numberic String Meaning
== eq Equal to
!= ne Not Equal to
> gt Greater than
>= ge Greater than or equal to
< lt Less than
<= le Less than or equal to
<> cmp Not equal to, with singed return

File Operations
-r $a File is readable by effective uid.
-R $a File is readable by real uid.
-w $a File is writable by effective uid.
-W $a File is writable by real uid.
-x $a File is executable by effective uid.
-X $a File is executable by real uid.
-o $a File is owned by effective uid.
-O $a File is owned by real uid.
-e $a File exists.
-z $a File has zero size.
-s $a File has non-zero size (returns size in bytes).
-f $a File is a plain file.
-d $a File is a directory.
-l $a File is a symbolic link.
-p $a File is a named pipe (FIFO).
-S $a File is a socket.
-b $a File is a block special file.
-c $a File is a character special file.
-u $a File has setuid bit set.
-g $a File has setgid bit set.
-k $a File has sticky bit set.
-t $a Filehandle is opened to a tty.
-T $a File is a text file.
-B $a File is a binary file.
-M $a Age of file (at startup) in days since modification.
-A $a Age of file (at startup) in days since last access.
-C $a Age of file (at startup) in days since inode change.

open(MYFILE,"> myfilename"); Create file.
open(MYFILE,">> myfilename"); Append to file.
open(MYFILE,"| output-pipe-command"); set up output filter.
open(MYFILE,"input-pipe-command|"); set up input filter.

Named Unary Operations
alarm getprotobyname log sin
chdir gethostbyname ord sleep
cos getnetbyname oct sqrt
chroot gmtime require srand
exit hex reset umask
eval int rand exp
length rmdir getpgrp localtime
readlink

Arrays:
$a = <STDIN>; sets @a to next input line.
@a = <STDIN>; sets @a to all the rest of the input lines.
@a = (1..3); same as @a = (1, 2, 3);
@a = (); same as @#a = -1; i.e. make null list.

Associative Arrays:
%a( 'Mon', 'Monday', 'Tue', 'Tuesday', ) Assigment
$b = $a{'Mon'} To access
%a = (); Make null list
%ENV Built in i.e. $home = $ENV{'HOME'}
%SIG Built in

Intrinsic Functions:
shift(@a)
unshift(@a,$b)
pop(@a)
push(@a,$b)
split(/char/,@a) ;* also make it as a LIST
splice(@a,offset,length[,LIST]);* like matparse
join('char',@a) ;* concatenate a list like reuse
format
local()
stat()
chop($a)
seed(HLD,$byteoffset,$length)
read(HLD,$record,$recordsize)
opendir(DIRHLD,$dir)
readdir(DIRHLD)
closedir(DIRHLD)

chop($number = <STDIN>); # input number and remove newline

which means the same thing as

$number = <STDIN>; # input number
chop($number); # remove newline

%longday = ("Sun", "Sunday", "Mon", "Monday", "Tue", "Tuesday",
"Wed", "Wednesday", "Thu", "Thursday", "Fri",
"Friday", "Sat", "Saturday");

Because it is sometimes difficult to read a hash that is defined
like this, Perl provides the => (equal sign, greater than) sequence
as an alternative separator to the comma. Using this syntax (and
some creative formatting), it is easier to see which strings are
the keys, and which strings are the associated values.

%longday = (
"Sun" => "Sunday",
"Mon" => "Monday",
"Tue" => "Tuesday",
"Wed" => "Wednesday",
"Thu" => "Thursday",
"Fri" => "Friday",
"Sat" => "Saturday",
);

$answer = 42; # an integer
$pi = 3.14159265; # a "real" number
$avocados = 6.02e23; # scientific notation
$pet = "Camel"; # string
$sign = "I love my $pet"; # string with interpolation
$cost = 'It costs $100'; # string without interpolation
$thence = $whence; # another variable
$x = $moles * $avocados; # an expression
$cwd = `pwd`; # string output from a command
$exit = system("vi $x"); # numeric status of a command
$fido = new Camel "Fido"; # an object

@home = ("couch", "chair", "table", "stove");

or

$home[0] = "couch";
$home[1] = "chair";
$home[2] = "table";
$home[3] = "stove";

Logical Operators Example Name Result
$a && $b And $a if $a is false, $b otherwise
$a || $b Or $a if $a is true, $b otherwise
! $a Not True if $a is not true
$a and $b And $a if $a is false, $b otherwise
$a or $b Or $a if $a is true, $b otherwise
not $a Not True if $a is not true

Numeric and String Comparison Operators Comparison
Numeric String Return Value
Equal == eq True if $a is equal to $b
Not equal != ne True if $a is not equal to $b
Less than < lt True if $a is less than $b
Greater than > gt True if $a is greater than $b
Less than or equal <= le True if $a not greater than $b
Comparison <=> cmp 0 if equal, 1 if $a greater, -1 if $b greater

Example Name Result
-e $a Exists True if file named in $a exists
-r $a Readable True if file named in $a is readable
-w $a Writable True if file named in $a is writable
-d $a Directory True if file named in $a is a directory
-f $a File True if file named in $a is a regular file
-T $a Text File True if file named in $a is a text file

$a = 5; # $a is assigned 5
$b = ++$a; # $b is assigned the incremented value of $a, 6
$c = $a--; # $c is assigned 6, then $a is decremented to 5
$line .= "\n"; # Append newline to $line.
$fill x= 80; # Make string $fill into 80 repeats of itself.
$val ||= "2"; # Set $val to 2 if it isn't already set.
$a = 123;
$b = 3;
print $a * $b; # prints 369
print $a x $b; # prints 123123123

while (defined ($line = <DATAFILE>)) {
chomp $line;
$size = length $line;
print "$size\n"; # output size of line
}

Because this is a common operation and that's a lot to type, Perl
gives it a shorthand notation. This shorthand reads lines into
$_ instead of $line. Many other string operations use $_ as a
default value to operate on, so this is more useful than it may
appear at first:

while (<DATAFILE>) {
chomp;
print length, "\n"; # output size of line
}

@lines = <DATAFILE>;

$whole_file = <FILE>; # 'slurp' mode

% perl -ne 'BEGIN { $/="%%\n" } chomp; print if /Unix/i' fortune.dat

The truncate function changes the length of a file, which can be
specified as a filehandle or as a filename. It returns true if the
file was successfully truncated, false otherwise:

truncate(HANDLE, $length)
or die "Couldn't truncate: $!\n";
truncate("/tmp/$$.pid", $length)
or die "Couldn't truncate: $!\n";

seek(LOGFILE, 0, 2) or die "Couldn't seek to the end: $!\n";
seek(DATAFILE, $pos, 0) or die "Couldn't seek to $pos: $!\n";
seek(OUT, -20, 1) or die "Couldn't seek back 20 bytes: $!\n";

The sysread and syswrite functions are different from their <FH>
and print counterparts. They both take a filehandle to act on, a
scalar variable to either read into or write out from, and the
number of bytes to read or write. They can also take an optional
fourth argument, the offset in the scalar variable to start reading
or writing at:

$written = syswrite(DATAFILE, $mystring, length($mystring));
die "syswrite failed: $!\n" unless $written == length($mystring);
$read = sysread(INFILE, $block, 256, 5);
warn "only read $read bytes, not 256" if 256 != $read;

$count = `wc -l < $file`;
die "wc failed: $?" if $?;
chomp($count);

You could also open the file and read line-by-line until the end,
counting lines as you go:

open(FILE, "< $file") or die "can't open $file: $!";
$count++ while <FILE>;
# $count now holds the number of lines read

Here's the fastest solution, assuming your line terminator
really is "\n":

$count += tr/\n/\n/ while sysread(FILE, $_, 2 ** 16);

Processing Every Word in a File

while (<>) {
for $chunk (split) {
# do something with $chunk
}
}

#define UT_LINESIZE 12
#define UT_NAMESIZE 8
#define UT_HOSTSIZE 16

struct utmp { /* here are the pack template codes */
short ut_type; /* s for short, must be padded */
pid_t ut_pid; /* i for integer */
char ut_line[UT_LINESIZE]; /* A12 for 12-char string */
char ut_id[2]; /* A2, but need x2 for alignment */
time_t ut_time; /* l for long */
char ut_user[UT_NAMESIZE]; /* A8 for 8-char string */
char ut_host[UT_HOSTSIZE]; /* A16 for 16-char string */
long ut_addr; /* l for long */
};
$APPDFLT = "/usr/local/share/myprog";

do "$APPDFLT/sysconfig.pl";
do "$ENV{HOME}/.myprogrc";
If you want to ignore the system config file when the user has their own, test the return value of the do.

do "$APPDFLT/sysconfig.pl"
or
do "$ENV{HOME}/.myprogrc";

($red, $green, $blue) = (0..2);
($name, $pw, $uid, $gid, $gcos, $home, $shell) = split(/:/, <PASSWD>)

Patern match
. Matches any character except newline
[a-z0-9] Matches any single char in set
[^a-z0-9] Matches any single char not in set
\d Matches a digit, same as [0-9]
\D Matches a non-digit, same as [^0-9]
\w Matches an alphanumeric (word) char [a-zA-Z0-9_]
\W Matches a non-word char [^a-zA-Z0-9_]
\s Matches a whitspace char (space, tab, newline...)
\S Matches a non-whitespace char

\n Matches a newline
\r Matches a return
\t Matches a tab
\f Matches a formfeed
\b Matches a backspace (inside [] only)
\0 Matches a null char
\000 Matches a null char because...
\metachar Matches the char itself (\|,\.,\*...)

(abc) Remembers the match for later backreferences
\1 Matches whatever first of parens matched
\2 Matches whatever second set of parens matched
\3 and so on ...

x? Matches 0 or 1 x's where x is any of above
x* Matches 0 or more x's
x+ Matches 1 or more x's
x{m,n} Matches at least m x's but no more than n

abc Matches all of a, b and c in order
fee|fie|foe Matches one of fee, fie or foe

\b Matches a word boundary (outside of [] only)
\B Matches a non-word boundary
^ Anchor matches begining of line or string
$ Anchors match to end of line or string