#archiveteam-bs 2014-12-14,Sun

↑back Search

Time Nickname Message
00:17 🔗 goekesmi has quit IRC (Quit: Coyote finally caught me)
00:17 🔗 goekesmi has joined #archiveteam-bs
00:19 🔗 BlueMaxim has joined #archiveteam-bs
00:23 🔗 goekesmi has quit IRC (Quit: Coyote finally caught me)
00:24 🔗 goekesmi has joined #archiveteam-bs
00:24 🔗 BlueMax has quit IRC (Ping timeout: 335 seconds)
01:05 🔗 godane so i figured out that the news930 program must be the morning news show
01:06 🔗 godane only cause they have traffic and weather update at the end of each show
01:31 🔗 Start http://techcrunch.com/2014/12/13/facebook-dumps-bing-will-introduce-its-own-search-tool/
01:33 🔗 mistym_ has quit IRC (Remote host closed the connection)
01:34 🔗 Start i'll be on vacation from tomorrow until dec. 28
01:35 🔗 Start i'll try to be on irc, not sure if i'll be able to use the warrior
02:14 🔗 mutoso has quit IRC (Read error: Operation timed out)
02:16 🔗 mutoso has joined #archiveteam-bs
02:39 🔗 primus104 has quit IRC (Leaving.)
02:41 🔗 schbirid has quit IRC (Read error: Operation timed out)
02:52 🔗 schbirid has joined #archiveteam-bs
03:44 🔗 aaaaaaaaa has quit IRC (Leaving)
03:58 🔗 ete_ has quit IRC (Remote host closed the connection)
04:49 🔗 BlueMaxim has quit IRC (Ping timeout: 335 seconds)
04:50 🔗 BlueMaxim has joined #archiveteam-bs
05:35 🔗 mistym has joined #archiveteam-bs
06:19 🔗 SadDM has quit IRC (Remote host closed the connection)
06:26 🔗 SadDM has joined #archiveteam-bs
07:31 🔗 APerti has quit IRC (Read error: Operation timed out)
08:19 🔗 brayden has quit IRC (Ping timeout: 606 seconds)
08:23 🔗 brayden has joined #archiveteam-bs
08:57 🔗 garyrh has quit IRC (Read error: Connection reset by peer)
08:59 🔗 garyrh has joined #archiveteam-bs
09:10 🔗 brayden has quit IRC (Read error: Connection reset by peer)
09:12 🔗 brayden has joined #archiveteam-bs
09:26 🔗 Pamela25 has joined #archiveteam-bs
09:35 🔗 Pamela25 has quit IRC (Read error: Connection reset by peer)
09:37 🔗 primus104 has joined #archiveteam-bs
09:37 🔗 Ctrl-S does anyone here knwo a good way to measure internet connection usage on windows, ubuntu, and centos? (Pref. by program on windows)
09:38 🔗 brayden has quit IRC (Quit: Leaving)
09:38 🔗 brayden has joined #archiveteam-bs
10:30 🔗 primus104 has quit IRC (Leaving.)
10:33 🔗 ivan`_ Ctrl-S: Windows 8 includes a task manager that shows network use, Windows 7 has a Performance Monitor
10:34 🔗 ivan`_ iftop and hethogs and ifconfig on Linux
10:34 🔗 ivan`_ nethogs
10:38 🔗 Ctrl-S ty
10:45 🔗 schbirid vnstat
10:46 🔗 midas snmp
10:46 🔗 Ctrl-S do these log over time, say to get a daily total?
10:47 🔗 schbirid vnstat
10:47 🔗 Ctrl-S is it worth porting my downloader scripts to use WARC?
10:47 🔗 schbirid maybe, maybe not
10:47 🔗 Ctrl-S atm they just output html
10:48 🔗 Ctrl-S i make scripts for asst art websites for personal use
10:48 🔗 Ctrl-S if i like something someone does, I save everything they've done
10:48 🔗 BlueMaxim has quit IRC (Quit: Leaving)
10:48 🔗 ivan`_ you need a WARC to get anything into wayback machine
10:48 🔗 ivan`_ WARCs preserve redirects and HTTP headers
10:51 🔗 Ctrl-S can i just dump the data from mechanize into a python warc library, or do I need to hanle each header myself?
10:51 🔗 rejon has joined #archiveteam-bs
10:51 🔗 Ctrl-S like warc_data = warkify(br.info(),br.read())
10:53 🔗 godane now this is awesome
10:54 🔗 Ctrl-S we have a new toy?
10:54 🔗 godane looks like one of my old g4 videos in mpeg2 got closed caption
10:55 🔗 godane example: https://archive.org/details/Gphoria_Prephoria_2004_With_Commercials
10:56 🔗 godane it doesn't look like it got re-derived now
10:59 🔗 godane anyways
10:59 🔗 ivan`_ Ctrl-S: you will probably have to program it yourself
11:00 🔗 ivan`_ unless you want to use wget or wpull and script that
11:00 🔗 ivan`_ or use a WARC proxy that writes WARCs
11:00 🔗 godane would there be a way to de-dup web archive?
11:03 🔗 godane i'm only thinking this cause of way back machine sometimes downloads videos like 800+ times
11:04 🔗 Ctrl-S I'd think they already do that
11:05 🔗 godane i wouldn't think they do
11:05 🔗 Ctrl-S and then store the deduplicated stuff in a properly redundant manner
11:06 🔗 godane i only think they don't cause it would be in different web archives on different dates
11:11 🔗 mistym has quit IRC (Remote host closed the connection)
11:25 🔗 joepie91 http://alonigi.kinja.com/bitcasa-give-me-my-data-back-1670878439?invitelink
11:27 🔗 godane thats with a guy trying to stay under 1TB
11:27 🔗 ivan`_ they better start conceptualizing
11:27 🔗 Kazzy i read the url as 'gave' instead of give.. i was happy for a moment
11:28 🔗 godane same here
11:28 🔗 ivan`_ same
11:45 🔗 godane so its only 38.3gb for about 4 years worth of kbs news930 program
11:47 🔗 primus104 has joined #archiveteam-bs
11:49 🔗 godane so looks like PSC also had the same problem as MSNBC
11:49 🔗 godane only started to get record on september 14 2001
12:39 🔗 Zebranky has quit IRC (Remote host closed the connection)
12:40 🔗 godane SketchCow: looks like this can be put in a colleciton: https://archive.org/search.php?query=creator%3A%22The%20MagPi%22&sort=-date
12:40 🔗 godane i was going to be uploading it but looks like some one bet me to it
13:35 🔗 lox has joined #archiveteam-bs
13:42 🔗 lox has quit IRC ()
14:58 🔗 staree has joined #archiveteam-bs
15:06 🔗 staree has quit IRC (Quit: Page closed)
15:32 🔗 rejon has quit IRC (Ping timeout: 369 seconds)
16:01 🔗 primus104 has quit IRC (Leaving.)
16:05 🔗 Zebranky has joined #archiveteam-bs
16:15 🔗 toad1 has quit IRC (Read error: Operation timed out)
16:15 🔗 toad2 has joined #archiveteam-bs
16:29 🔗 aaaaaaaaa has joined #archiveteam-bs
17:07 🔗 primus104 has joined #archiveteam-bs
17:19 🔗 Start off to argentina today
17:19 🔗 Start bye
17:19 🔗 Start has quit IRC (Quit: Disconnected.)
17:22 🔗 yipdw Ctrl-S: WARCs are often stored as individually gzipped records, and record content may differ for two identical responses
17:24 🔗 yipdw it doesn't rule out deduplication occuring at IA but I don't understand what space benefit they'd get that would outweigh the other costs of dedup
17:25 🔗 yipdw at least for the specific case of Web data; dedup in other systems might happen
17:25 🔗 yipdw email archive.org and ask I guess
17:27 🔗 yipdw I mention the gzip thing because it does provide a useful space savings on its own; in archivebot data we get around a 2:1 compression ratio for WARC:total downloaded
17:28 🔗 yipdw it's not stellar but taking an archive from e.g. 700 GB to 350 GB matters
17:29 🔗 yipdw of course if all your WARCs are PNGs or JPEGs or similarly incompressible stuff then all that is irrelevant
19:01 🔗 mistym has joined #archiveteam-bs
21:14 🔗 SketchCow I'll stumble on that in my cleaning, godane.
21:14 🔗 SketchCow I'm going to wreck the opensource pile.
21:15 🔗 SketchCow I took it from 560,000 to something like 210,000
21:27 🔗 SketchCow */win 5
21:39 🔗 ete_ has joined #archiveteam-bs
21:47 🔗 BlueMaxim has joined #archiveteam-bs
21:49 🔗 joepie91 https://imgur.com/gallery/hRf2trV (cc SketchCow)
21:56 🔗 BlueMaxim as someone who lives in nsw, australia, pretty sure that's child abuse even by our standards
22:00 🔗 mutoso has quit IRC (Read error: Operation timed out)
22:08 🔗 schbirid has quit IRC (Leaving)
22:10 🔗 mutoso has joined #archiveteam-bs
23:20 🔗 Aranje has quit IRC (ny.us.hub irc.paraphysics.net)
23:20 🔗 Sue_ has quit IRC (ny.us.hub irc.paraphysics.net)
23:20 🔗 dx has quit IRC (ny.us.hub irc.paraphysics.net)
23:20 🔗 sep332 has quit IRC (ny.us.hub irc.paraphysics.net)
23:20 🔗 ivan`_ has quit IRC (ny.us.hub irc.paraphysics.net)
23:20 🔗 phuzion has quit IRC (ny.us.hub irc.paraphysics.net)
23:20 🔗 Sellyme_ has quit IRC (ny.us.hub irc.paraphysics.net)
23:21 🔗 Aranje has joined #archiveteam-bs
23:21 🔗 Sue_ has joined #archiveteam-bs
23:21 🔗 dx has joined #archiveteam-bs
23:21 🔗 sep332 has joined #archiveteam-bs
23:21 🔗 ivan`_ has joined #archiveteam-bs
23:21 🔗 phuzion has joined #archiveteam-bs
23:21 🔗 Sellyme_ has joined #archiveteam-bs
23:21 🔗 irc.paraphysics.net sets mode: +o Sellyme_
23:21 🔗 mistym has quit IRC (Remote host closed the connection)
23:24 🔗 APerti has joined #archiveteam-bs
23:48 🔗 Lord_Nigh has quit IRC (Read error: Operation timed out)
23:50 🔗 Lord_Nigh has joined #archiveteam-bs

irclogger-viewer