[00:17] *** goekesmi has quit IRC (Quit: Coyote finally caught me) [00:17] *** goekesmi has joined #archiveteam-bs [00:19] *** BlueMaxim has joined #archiveteam-bs [00:23] *** goekesmi has quit IRC (Quit: Coyote finally caught me) [00:24] *** goekesmi has joined #archiveteam-bs [00:24] *** BlueMax has quit IRC (Ping timeout: 335 seconds) [01:05] so i figured out that the news930 program must be the morning news show [01:06] only cause they have traffic and weather update at the end of each show [01:31] http://techcrunch.com/2014/12/13/facebook-dumps-bing-will-introduce-its-own-search-tool/ [01:33] *** mistym_ has quit IRC (Remote host closed the connection) [01:34] i'll be on vacation from tomorrow until dec. 28 [01:35] i'll try to be on irc, not sure if i'll be able to use the warrior [02:14] *** mutoso has quit IRC (Read error: Operation timed out) [02:16] *** mutoso has joined #archiveteam-bs [02:39] *** primus104 has quit IRC (Leaving.) [02:41] *** schbirid has quit IRC (Read error: Operation timed out) [02:52] *** schbirid has joined #archiveteam-bs [03:44] *** aaaaaaaaa has quit IRC (Leaving) [03:58] *** ete_ has quit IRC (Remote host closed the connection) [04:49] *** BlueMaxim has quit IRC (Ping timeout: 335 seconds) [04:50] *** BlueMaxim has joined #archiveteam-bs [05:35] *** mistym has joined #archiveteam-bs [06:19] *** SadDM has quit IRC (Remote host closed the connection) [06:26] *** SadDM has joined #archiveteam-bs [07:31] *** APerti has quit IRC (Read error: Operation timed out) [08:19] *** brayden has quit IRC (Ping timeout: 606 seconds) [08:23] *** brayden has joined #archiveteam-bs [08:57] *** garyrh has quit IRC (Read error: Connection reset by peer) [08:59] *** garyrh has joined #archiveteam-bs [09:10] *** brayden has quit IRC (Read error: Connection reset by peer) [09:12] *** brayden has joined #archiveteam-bs [09:26] *** Pamela25 has joined #archiveteam-bs [09:35] *** Pamela25 has quit IRC (Read error: Connection reset by peer) [09:37] *** primus104 has joined #archiveteam-bs [09:37] does anyone here knwo a good way to measure internet connection usage on windows, ubuntu, and centos? (Pref. by program on windows) [09:38] *** brayden has quit IRC (Quit: Leaving) [09:38] *** brayden has joined #archiveteam-bs [10:30] *** primus104 has quit IRC (Leaving.) [10:33] Ctrl-S: Windows 8 includes a task manager that shows network use, Windows 7 has a Performance Monitor [10:34] iftop and hethogs and ifconfig on Linux [10:34] nethogs [10:38] ty [10:45] vnstat [10:46] snmp [10:46] do these log over time, say to get a daily total? [10:47] vnstat [10:47] is it worth porting my downloader scripts to use WARC? [10:47] maybe, maybe not [10:47] atm they just output html [10:48] i make scripts for asst art websites for personal use [10:48] if i like something someone does, I save everything they've done [10:48] *** BlueMaxim has quit IRC (Quit: Leaving) [10:48] you need a WARC to get anything into wayback machine [10:48] WARCs preserve redirects and HTTP headers [10:51] can i just dump the data from mechanize into a python warc library, or do I need to hanle each header myself? [10:51] *** rejon has joined #archiveteam-bs [10:51] like warc_data = warkify(br.info(),br.read()) [10:53] now this is awesome [10:54] we have a new toy? [10:54] looks like one of my old g4 videos in mpeg2 got closed caption [10:55] example: https://archive.org/details/Gphoria_Prephoria_2004_With_Commercials [10:56] it doesn't look like it got re-derived now [10:59] anyways [10:59] Ctrl-S: you will probably have to program it yourself [11:00] unless you want to use wget or wpull and script that [11:00] or use a WARC proxy that writes WARCs [11:00] would there be a way to de-dup web archive? [11:03] i'm only thinking this cause of way back machine sometimes downloads videos like 800+ times [11:04] I'd think they already do that [11:05] i wouldn't think they do [11:05] and then store the deduplicated stuff in a properly redundant manner [11:06] i only think they don't cause it would be in different web archives on different dates [11:11] *** mistym has quit IRC (Remote host closed the connection) [11:25] http://alonigi.kinja.com/bitcasa-give-me-my-data-back-1670878439?invitelink [11:27] thats with a guy trying to stay under 1TB [11:27] they better start conceptualizing [11:27] i read the url as 'gave' instead of give.. i was happy for a moment [11:28] same here [11:28] same [11:45] so its only 38.3gb for about 4 years worth of kbs news930 program [11:47] *** primus104 has joined #archiveteam-bs [11:49] so looks like PSC also had the same problem as MSNBC [11:49] only started to get record on september 14 2001 [12:39] *** Zebranky has quit IRC (Remote host closed the connection) [12:40] SketchCow: looks like this can be put in a colleciton: https://archive.org/search.php?query=creator%3A%22The%20MagPi%22&sort=-date [12:40] i was going to be uploading it but looks like some one bet me to it [13:35] *** lox has joined #archiveteam-bs [13:42] *** lox has quit IRC () [14:58] *** staree has joined #archiveteam-bs [15:06] *** staree has quit IRC (Quit: Page closed) [15:32] *** rejon has quit IRC (Ping timeout: 369 seconds) [16:01] *** primus104 has quit IRC (Leaving.) [16:05] *** Zebranky has joined #archiveteam-bs [16:15] *** toad1 has quit IRC (Read error: Operation timed out) [16:15] *** toad2 has joined #archiveteam-bs [16:29] *** aaaaaaaaa has joined #archiveteam-bs [17:07] *** primus104 has joined #archiveteam-bs [17:19] off to argentina today [17:19] bye [17:19] *** Start has quit IRC (Quit: Disconnected.) [17:22] Ctrl-S: WARCs are often stored as individually gzipped records, and record content may differ for two identical responses [17:24] it doesn't rule out deduplication occuring at IA but I don't understand what space benefit they'd get that would outweigh the other costs of dedup [17:25] at least for the specific case of Web data; dedup in other systems might happen [17:25] email archive.org and ask I guess [17:27] I mention the gzip thing because it does provide a useful space savings on its own; in archivebot data we get around a 2:1 compression ratio for WARC:total downloaded [17:28] it's not stellar but taking an archive from e.g. 700 GB to 350 GB matters [17:29] of course if all your WARCs are PNGs or JPEGs or similarly incompressible stuff then all that is irrelevant [19:01] *** mistym has joined #archiveteam-bs [21:14] I'll stumble on that in my cleaning, godane. [21:14] I'm going to wreck the opensource pile. [21:15] I took it from 560,000 to something like 210,000 [21:27] */win 5 [21:39] *** ete_ has joined #archiveteam-bs [21:47] *** BlueMaxim has joined #archiveteam-bs [21:49] https://imgur.com/gallery/hRf2trV (cc SketchCow) [21:56] as someone who lives in nsw, australia, pretty sure that's child abuse even by our standards [22:00] *** mutoso has quit IRC (Read error: Operation timed out) [22:08] *** schbirid has quit IRC (Leaving) [22:10] *** mutoso has joined #archiveteam-bs [23:20] *** Aranje has quit IRC (ny.us.hub irc.paraphysics.net) [23:20] *** Sue_ has quit IRC (ny.us.hub irc.paraphysics.net) [23:20] *** dx has quit IRC (ny.us.hub irc.paraphysics.net) [23:20] *** sep332 has quit IRC (ny.us.hub irc.paraphysics.net) [23:20] *** ivan`_ has quit IRC (ny.us.hub irc.paraphysics.net) [23:20] *** phuzion has quit IRC (ny.us.hub irc.paraphysics.net) [23:20] *** Sellyme_ has quit IRC (ny.us.hub irc.paraphysics.net) [23:21] *** Aranje has joined #archiveteam-bs [23:21] *** Sue_ has joined #archiveteam-bs [23:21] *** dx has joined #archiveteam-bs [23:21] *** sep332 has joined #archiveteam-bs [23:21] *** ivan`_ has joined #archiveteam-bs [23:21] *** phuzion has joined #archiveteam-bs [23:21] *** Sellyme_ has joined #archiveteam-bs [23:21] *** irc.paraphysics.net sets mode: +o Sellyme_ [23:21] *** mistym has quit IRC (Remote host closed the connection) [23:24] *** APerti has joined #archiveteam-bs [23:48] *** Lord_Nigh has quit IRC (Read error: Operation timed out) [23:50] *** Lord_Nigh has joined #archiveteam-bs