parallel processing - OProfile with OpenMP -
i using oprofile openmp parallelized code doing following,
$ gcc -i/usr/include/hdf5/serial/ -std=c11 -o3 -fopt-info -fopenmp sp_linsvm.c -o sp_linsvm -lhdf5_serial $ sudo ocount --events=cpu_clk_unhalted,llc_misses,llc_refs,mem_inst_retired,br_misp_exec, ./sp_linsvm events actively counted 22.0 seconds. event counts (scaled) /home/aidan/progs/linsvm/sp_linsvm: event count % time counted br_misp_exec 6,523,181 80.00 cpu_clk_unhalted 225,384,009,348 80.00 llc_misses 276,587,407 80.02 llc_refs 1,098,236,806 80.00 mem_inst_retired 51,754,855,734 79.99 how know if events counted per cpu or whole? pretty sure whole close numbers if compiled without openmp, want sure.
default mode ocount ... ./program "command". understand, without -t (--separate-thread) or -c (--separate-cpu) options, data threads aggregated.
so, check documentation http://oprofile.sourceforge.net/doc/controlling-counter.html#controlling-ocount , try -t / -c options...
--separate-thread/-toption can used in conjunction either --process-list or --thread-list option display event counts on per-thread (per-process) basis. without option, counts aggregated.
--separate-cpu/-coption can used in conjunction either --system-wide or --cpu-list option display event counts on per-cpu basis. without option, counts aggregated.
Comments
Post a Comment