HW/OS2011. 7. 11. 10:35
1. 제조사   
   - 회사명 = lsattr -El sys0
   - 모델명 = prtconf (하드웨어config)
 
2. OS version
   - 버전 = oslevel -r
 
3. CPU  
   - Arch = prtconf
   - Hz   = lsattr -El proc0 (단위 Hz)
   - 개수 = lsdev -Cc processor|wc -l
   
4. Memory
   - 용량 = lsattr -El sys0, prtconf
 
5. Virtual Memory
   - 용량 = lsps -a
 
6. Internal Disk
   - size = bootinfo -s hdisk(숫자) (Mb단위) 
   - 개수 = lsdev -Cc disk
 
7. External Disk (SSA)
   - size = bootinfo -s hdisk(숫자) (Mb단위) 
   - 개수 = lsdev -Cc pdisk , lsdev -Ct hdisk
 
8. rootvg mirror(y/n)
   - not mirror = lsvg rootvg (ACTIVE PVs = 1)
   - mirror     = lsvg rootvg (ACTIVE PVs = 2)
 
9. NIC
   - 속도, 개수 = lsparent -Ck ent
 
10. 시스템에 장착된 부품들에 위치확인
   - lscfg -vp
 
* 일반적 정보 
prtconf                      = list system configuration 
lscfg [-v]                  = devices (-v = verbose for microcode levels, etc) 
lscfg -v                     = devices verbose (microcode level, firmware, etc) 
lsdev -Cc adapter      = adapter cards 
lsdev -Cc disk           = disks 
lsdev -Cc processor  = CPU s 
lsattr -El sys0            = serial number, model number, memory
* AIX 관련 정보 
oslevel                      = AIX OS level 
instfix -i |grep ML      = AIX maintenance level 
lslpp -l                      = installed SW and levels
* Disk 관련 정보 
lsvg -o                      = active volume groups 
lsvg -p vgname          = disk drives in VG 
lsvg -l vgname           = LV s in VG 
lslv lvname                = LV detail 
lslv -l lvname             = LV disk location 
lspv                          = disks 
lspv -l hdisk#             = LV s residing on a disk

* Network 관련 정보 
lsdev -Cc if               = List network interfaces 
netstat -rn                 = List network gateways
Posted by [TheWon]
Cluster/HA2011. 7. 11. 09:34
# clcheck_server grpsvcs ;  print $?
  A return code of 1 indicates services have started.

# lssrc  -ls  clstrmgrES  |  grep  state
  A Current state of ST_STABLE indicates that cluster services are started.

# tail  -f  /var/hacmp/adm/cluster.log  |  cut  -f  4,7-  -d" "
  look for EVENT COMPLETED: node_up_complete for all the nodes where you started cluster services.

#  cldump  |  more    or clstat (you must start clinfoES to use clstat)
   look for:
       Cluster State is UP
       Cluster Substate is STABLE
       Both nodes’ State is UP

Determine the cluster name and networks

/usr/es/sbin/cluster/utilities/cltopinfo

Cluster Name: cluster1
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
There are 2 node(s) and 3 network(s) defined

NODE node1:
        Network net_ether_01
                cluster	192.168.53.2
                node1	192.168.53.8
                node1_s	192.168.49.2
        Network net_tmscsi_0
                tmscsi0_node1  /dev/tmscsi0
        Network net_tmscsi_1
                tmscsi1_node1  /dev/tmscsi1

NODE node2:
        Network net_ether_01
                cluster	192.168.53.2
                node2	192.168.53.9
                node2_s	192.168.59.3
        Network net_tmscsi_0
                tmscsi0_node2  /dev/tmscsi0
        Network net_tmscsi_1
                tmscsi1_node2  /dev/tmscsi1

Resource Group cache
        Startup Policy   Online Using Distribution Policy
        Fallover Policy  Fallover To Next Priority Node In The List
        Fallback Policy  Never Fallback
        Participating Nodes      node1 node2
        Service IP Label             cluster
        
        Total Heartbeats Missed:        788
Cluster Topology Start Time:    05/25/2009 21:41:14

 

Determine the cluster ID

/usr/es/sbin/cluster/utilities/clrsctinfo -p cllsclstr

1472902783      cluster1   Standard

 

Verify the operational status of the topology services subsystem

lssrc -ls topsvcs

 

Verify the HACMP configuration:

/usr/es/sbin/cluster/diag/clconfig -v -O

Example output from clconfig -v -0

HACMPnode ODM on node tomato verified.

HACMPnetwork ODM on node tomato verified.

HACMPcluster ODM on node tomato verified.

HACMPnim ODM on node tomato verified.

HACMPadapter ODM on node tomato verified.

HACMPtopsvcs ODM on node tomato verified.

HACMPsite ODM on node tomato verified.

HACMPnode ODM on node tomato verified.

HACMPgroup ODM on node tomato verified.

HACMPresource ODM on node tomato verified.

HACMPserver ODM on node tomato verified.

HACMPcommadapter ODM on node tomato verified.

HACMPcommlink ODM on node tomato verified.

HACMPx25 ODM on node tomato verified.

HACMPsna ODM on node tomato verified.

HACMPevent ODM on node tomato verified.

HACMPcustom ODM on node tomato verified.

HACMPlogs ODM on node tomato verified.

HACMPtape ODM on node tomato verified.

HACMPmonitor ODM on node tomato verified.

HACMPpager ODM on node tomato verified.

HACMPport ODM on node tomato verified.

HACMPnpp ODM on node tomato verified.

HACMPude ODM on node tomato verified.

HACMPrresmethods ODM on node tomato verified.

HACMPdisksubsys ODM on node tomato verified.

HACMPpprc ODM on node tomato verified.

HACMPpairtasks ODM on node tomato verified.

HACMPpathtasks ODM on node tomato verified.

HACMPercmf ODM on node tomato verified.

HACMPercmfglobals ODM on node tomato verified.

HACMPtimer ODM on node tomato verified.

HACMPsiteinfo ODM on node tomato verified.

HACMPtimersvc ODM on node tomato verified.

HACMPfilecollection ODM on node tomato verified.

HACMPfcfile ODM on node tomato verified.

HACMPrgdependency ODM on node tomato verified.

HACMPrg_loc_dependency ODM on node tomato verified.

HACMPsvc ODM on node tomato verified.

HACMPsvcpprc ODM on node tomato verified.

HACMPsvcrelationship ODM on node tomato verified.

HACMPsa_metadata ODM on node tomato verified.

HACMPcsserver ODM on node tomato verified.

HACMPoemfsmethods ODM on node tomato verified.

HACMPoemvgmethods ODM on node tomato verified.

HACMPoemvolumegroup ODM on node tomato verified.

HACMPoemfilesystem ODM on node tomato verified.

HACMPdisktype ODM on node tomato verified.

Verification to be performed on the following:
        Cluster Topology
        Cluster Resources

Retrieving data from available cluster nodes.  This could take a few minutes.........

Verifying Cluster Topology...

WARNING: Network option "nonlocsrcroute" is set to 0 on the following nodes:

        cabbage

WARNING: Network option "ipsrcrouterecv" is set to 0 on the following nodes:

        cabbage

Verifying Cluster Resources...

WARNING: Application monitors are required for detecting application failures
in order for HACMP to recover from them.  Application monitors are started
by HACMP when the resource group in which they participate is activated.
The following application(s), shown with their associated resource group,
do not have an application monitor configured:

   Application Server                Resource Group
   --------------------------------  ---------------------------------
   appserv                           data
A corrective action is available for the condition reported below:

WARNING: The LVM time stamp for shared volume group: datavg is inconsistent
with the time stamp in the VGDA for the following nodes:
node1

To correct the above condition, run verification & synchronization with
"Automatically correct errors found during verification?" set to either 'Yes'
or 'Interactive'.  The cluster must be down for the corrective action to run.

Corrective actions can be enabled for Verification and Synchronization in the
HACMP extended Verification and Synchronization SMIT fastpath "cl_sync".
Alternatively use the Initialization and Standard Configuration -> Verification
and Synchronization path where corrective actions are always executed in
interactive mode.

Remember to redo automatic error notification if configuration has changed.

Verification has completed normally.

 

Check the version of HACMP:

lslpp -L | grep cluster.es.server.rte

Example output from lslpp -L | grep cluster.es.server.rte

cluster.es.server.rte      5.4.0.1    C     F    ES Base Server Runtime

 

Show cluster services:

/usr/es/sbin/cluster/utilities/clshowsrv -v

Example output from clshowsrv -v

Status of the RSCT subsystems used by HACMP:
Subsystem         Group            PID          Status
 topsvcs          topsvcs          278684       active
 grpsvcs          grpsvcs          332026       active
 grpglsm          grpsvcs                       inoperative
 emsvcs           emsvcs           446712       active
 emaixos          emsvcs           294942       active
 ctrmc            rsct             131212       active

Status of the HACMP subsystems:
Subsystem         Group            PID          Status
 clcomdES         clcomdES         204984       active
 clstrmgrES       cluster          86080        active

Status of the optional HACMP subsystems:
Subsystem         Group            PID          Status
 clinfoES         cluster          360702       active

 

Monitor the cluster status:

/usr/sbin/cluster/clstat

Example output from clstat

			clstat - HACMP Cluster Status Monitor
            -------------------------------------

Cluster: data_cluster  (1274902884)
Wed 24 Sep 10:37:41 2008
                State: UP               Nodes: 2
                SubState: STABLE


        Node: tomato            State: UP
           Interface: tomato_s (0)              Address: 192.168.10.2
                                                State:   UP
           Interface: tomato (0)                Address: 192.168.12.4
                                                State:   DOWN
           Interface: data (0)                 Address: 192.168.12.5
                                                State:   UP
           Resource Group: data                        State:  On line

        Node: cabbage           State: DOWN
           Interface: cabbage_s (0)              Address: 192.168.10.3
                                                State:   DOWN
           Interface: cabbage (0)                Address: 192.168.12.9
                                                State:   DOWN

 

SNMP-based tool to show cluster state

/usr/es/sbin/cluster/utilities/cldump

Obtaining information via SNMP from Node: node1...

_____________________________________________________________________________
Cluster Name: CLSUTER1
Cluster State: UP
Cluster Substate: STABLE
_____________________________________________________________________________

Node Name: node1                 State: UP

  Network Name: network1                              State: UP

    Address: 10.11.190.124   Label: net1_bootB        State: UP
    Address: 10.11.190.60    Label: net1_bootA        State: UP
    Address: 10.11.190.8     Label: net1_srvc         State: UP

  Network Name: network2         State: UP

    Address: 10.11.191.10    Label: net2_srvc         State: UP
    Address: 10.11.191.126   Label: net2_bootB        State: UP
    Address: 10.11.191.62    Label: net2_bootA        State: UP

  Network Name: ds4700a           State: UP

    Address:                 Label: node1_hdisk22     State: UP

  Network Name: ds4700b           State: UP

    Address:                 Label: node1_hdisk34_01  State: UP

Node Name: node2                  State: UP

  Network Name: network1          State: UP

    Address: 10.11.190.125   Label: node2_bootB       State: UP
    Address: 10.11.190.61    Label: node2_bootA       State: UP
    Address: 10.11.190.9     Label: node2_srvc        State: UP

  Network Name: network2          State: UP

    Address: 10.11.191.11    Label: node2_srvc        State: UP
    Address: 10.11.191.127   Label: node2_bootB       State: UP
    Address: 10.11.191.63    Label: node2_bootA       State: UP

  Network Name: ds4700a           State: UP

    Address:                 Label: node2_hdisk14     State: UP

  Network Name: ds4700b           State: UP

    Address:                 Label: node2_hdisk35_01  State: UP


Cluster Name: CLUSTER1

Resource Group Name: res_gp1
Startup Policy: Online On Home Node Only
Fallover Policy: Fallover To Next Priority Node In The List
Fallback Policy: Never Fallback
Site Policy: ignore
Node                         Group State
---------------------------- ---------------
node1                        ONLINE
node2                        OFFLINE

Resource Group Name: res_gp2
Startup Policy: Online On Home Node Only
Fallover Policy: Fallover To Next Priority Node In The List
Fallback Policy: Never Fallback
Site Policy: ignore
Node                         Group State
---------------------------- ---------------
node1                        ONLINE
node2                        OFFLINE

Resource Group Name: res_gp3
Startup Policy: Online On Home Node Only
Fallover Policy: Fallover To Next Priority Node In The List
Fallback Policy: Never Fallback
Site Policy: ignore
Node                         Group State
---------------------------- ---------------
node1                        ONLINE
node2                        OFFLINE

Resource Group Name: res_gp4
Startup Policy: Online On Home Node Only
Fallover Policy: Fallover To Next Priority Node In The List
Fallback Policy: Never Fallback
Site Policy: ignore
Node                         Group State
---------------------------- ---------------
node1                        ONLINE 
node2 OFFLINE 
Posted by [TheWon]
Cluster/HA2011. 7. 11. 09:30
# odmget HACMPlogs
Posted by [TheWon]
HW/OS2011. 7. 11. 09:29
svmon -G for system wide memory
The “virtual” column times the page size (4*1024) is non-filesystem pages in use.
Total memory available is under “size” column times the page size.

Amount of memory committed to the native heap
the number of ’Inuse’ pages in the svmon output
http://publib.boulder.ibm.com/infocenter/javasdk/v5r0/topic/com.ibm.java.doc.diagnostics.50/diag/problem_determination/aix_mem_native_heap_usage.html

Memory usage by processes
http://publib.boulder.ibm.com/infocenter/pseries/v5r3/topic/com.ibm.aix.prftungd/doc/prftungd/mem_use_processes.htm

svmon -P $PID -m -r -i 60 5 > svmon.out &
http://www-01.ibm.com/support/docview.wss?uid=swg21138587

for PID in 123456 78901234; do nohup svmon -P $PID -m -r -i 60 > svmon$PID.out 2>&1 & done

Monitoring the Process Size on AIX
http://www-01.ibm.com/support/docview.wss?uid=swg21222446

The segment numbers show up under Esid; the above shows segments 3, 4, 5, and 6 being used for "work," and segments 7, 8, 9, a and b are using mmap
http://www.ibm.com/developerworks/systems/articles/aix4java1.html

To find out if you are running low on native heap, observe the "InUse" column of svmon output. This column lists the pages, each of size 4 KB, being used. Observing the value for each segment belonging to native heap gives you a good idea of native heap usage. The maximum value possible for InUse for a segment is 256MB/4KB = 65536
http://www.ibm.com/developerworks/systems/articles/aix4java1.html
Posted by [TheWon]
Cluster/HA2011. 7. 11. 09:23
# lssrc -ls clstrmgrES | grep state

Possible cluster states

- ST_INIT: cluster configured and down
- ST_JOINING: node joining the cluster
- ST_VOTING: Inter-node decision state for an event
- ST_RP_RUNNING: cluster running recovery program
- ST_BARRIER: clstrmgr waiting at the barrier statement
- ST_CBARRIER: clstrmgr is exiting recovery program
- ST_UNSTABLE: cluster unstable
- NOT_CONFIGURED: HA installed but not configured
- RP_FAILED: event script failed
- ST_STABLE: cluster services are running with managed resources (stable cluster) or cluster services have been "forced" down 
    with resource groups potentially in the UNMANAGED state (HACMP 5.4 only)
Posted by [TheWon]