Time |
Nickname |
Message |
00:17
🔗
|
|
goekesmi has quit IRC (Quit: Coyote finally caught me) |
00:17
🔗
|
|
goekesmi has joined #archiveteam-bs |
00:19
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
00:23
🔗
|
|
goekesmi has quit IRC (Quit: Coyote finally caught me) |
00:24
🔗
|
|
goekesmi has joined #archiveteam-bs |
00:24
🔗
|
|
BlueMax has quit IRC (Ping timeout: 335 seconds) |
01:05
🔗
|
godane |
so i figured out that the news930 program must be the morning news show |
01:06
🔗
|
godane |
only cause they have traffic and weather update at the end of each show |
01:31
🔗
|
Start |
http://techcrunch.com/2014/12/13/facebook-dumps-bing-will-introduce-its-own-search-tool/ |
01:33
🔗
|
|
mistym_ has quit IRC (Remote host closed the connection) |
01:34
🔗
|
Start |
i'll be on vacation from tomorrow until dec. 28 |
01:35
🔗
|
Start |
i'll try to be on irc, not sure if i'll be able to use the warrior |
02:14
🔗
|
|
mutoso has quit IRC (Read error: Operation timed out) |
02:16
🔗
|
|
mutoso has joined #archiveteam-bs |
02:39
🔗
|
|
primus104 has quit IRC (Leaving.) |
02:41
🔗
|
|
schbirid has quit IRC (Read error: Operation timed out) |
02:52
🔗
|
|
schbirid has joined #archiveteam-bs |
03:44
🔗
|
|
aaaaaaaaa has quit IRC (Leaving) |
03:58
🔗
|
|
ete_ has quit IRC (Remote host closed the connection) |
04:49
🔗
|
|
BlueMaxim has quit IRC (Ping timeout: 335 seconds) |
04:50
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
05:35
🔗
|
|
mistym has joined #archiveteam-bs |
06:19
🔗
|
|
SadDM has quit IRC (Remote host closed the connection) |
06:26
🔗
|
|
SadDM has joined #archiveteam-bs |
07:31
🔗
|
|
APerti has quit IRC (Read error: Operation timed out) |
08:19
🔗
|
|
brayden has quit IRC (Ping timeout: 606 seconds) |
08:23
🔗
|
|
brayden has joined #archiveteam-bs |
08:57
🔗
|
|
garyrh has quit IRC (Read error: Connection reset by peer) |
08:59
🔗
|
|
garyrh has joined #archiveteam-bs |
09:10
🔗
|
|
brayden has quit IRC (Read error: Connection reset by peer) |
09:12
🔗
|
|
brayden has joined #archiveteam-bs |
09:26
🔗
|
|
Pamela25 has joined #archiveteam-bs |
09:35
🔗
|
|
Pamela25 has quit IRC (Read error: Connection reset by peer) |
09:37
🔗
|
|
primus104 has joined #archiveteam-bs |
09:37
🔗
|
Ctrl-S |
does anyone here knwo a good way to measure internet connection usage on windows, ubuntu, and centos? (Pref. by program on windows) |
09:38
🔗
|
|
brayden has quit IRC (Quit: Leaving) |
09:38
🔗
|
|
brayden has joined #archiveteam-bs |
10:30
🔗
|
|
primus104 has quit IRC (Leaving.) |
10:33
🔗
|
ivan`_ |
Ctrl-S: Windows 8 includes a task manager that shows network use, Windows 7 has a Performance Monitor |
10:34
🔗
|
ivan`_ |
iftop and hethogs and ifconfig on Linux |
10:34
🔗
|
ivan`_ |
nethogs |
10:38
🔗
|
Ctrl-S |
ty |
10:45
🔗
|
schbirid |
vnstat |
10:46
🔗
|
midas |
snmp |
10:46
🔗
|
Ctrl-S |
do these log over time, say to get a daily total? |
10:47
🔗
|
schbirid |
vnstat |
10:47
🔗
|
Ctrl-S |
is it worth porting my downloader scripts to use WARC? |
10:47
🔗
|
schbirid |
maybe, maybe not |
10:47
🔗
|
Ctrl-S |
atm they just output html |
10:48
🔗
|
Ctrl-S |
i make scripts for asst art websites for personal use |
10:48
🔗
|
Ctrl-S |
if i like something someone does, I save everything they've done |
10:48
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
10:48
🔗
|
ivan`_ |
you need a WARC to get anything into wayback machine |
10:48
🔗
|
ivan`_ |
WARCs preserve redirects and HTTP headers |
10:51
🔗
|
Ctrl-S |
can i just dump the data from mechanize into a python warc library, or do I need to hanle each header myself? |
10:51
🔗
|
|
rejon has joined #archiveteam-bs |
10:51
🔗
|
Ctrl-S |
like warc_data = warkify(br.info(),br.read()) |
10:53
🔗
|
godane |
now this is awesome |
10:54
🔗
|
Ctrl-S |
we have a new toy? |
10:54
🔗
|
godane |
looks like one of my old g4 videos in mpeg2 got closed caption |
10:55
🔗
|
godane |
example: https://archive.org/details/Gphoria_Prephoria_2004_With_Commercials |
10:56
🔗
|
godane |
it doesn't look like it got re-derived now |
10:59
🔗
|
godane |
anyways |
10:59
🔗
|
ivan`_ |
Ctrl-S: you will probably have to program it yourself |
11:00
🔗
|
ivan`_ |
unless you want to use wget or wpull and script that |
11:00
🔗
|
ivan`_ |
or use a WARC proxy that writes WARCs |
11:00
🔗
|
godane |
would there be a way to de-dup web archive? |
11:03
🔗
|
godane |
i'm only thinking this cause of way back machine sometimes downloads videos like 800+ times |
11:04
🔗
|
Ctrl-S |
I'd think they already do that |
11:05
🔗
|
godane |
i wouldn't think they do |
11:05
🔗
|
Ctrl-S |
and then store the deduplicated stuff in a properly redundant manner |
11:06
🔗
|
godane |
i only think they don't cause it would be in different web archives on different dates |
11:11
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
11:25
🔗
|
joepie91 |
http://alonigi.kinja.com/bitcasa-give-me-my-data-back-1670878439?invitelink |
11:27
🔗
|
godane |
thats with a guy trying to stay under 1TB |
11:27
🔗
|
ivan`_ |
they better start conceptualizing |
11:27
🔗
|
Kazzy |
i read the url as 'gave' instead of give.. i was happy for a moment |
11:28
🔗
|
godane |
same here |
11:28
🔗
|
ivan`_ |
same |
11:45
🔗
|
godane |
so its only 38.3gb for about 4 years worth of kbs news930 program |
11:47
🔗
|
|
primus104 has joined #archiveteam-bs |
11:49
🔗
|
godane |
so looks like PSC also had the same problem as MSNBC |
11:49
🔗
|
godane |
only started to get record on september 14 2001 |
12:39
🔗
|
|
Zebranky has quit IRC (Remote host closed the connection) |
12:40
🔗
|
godane |
SketchCow: looks like this can be put in a colleciton: https://archive.org/search.php?query=creator%3A%22The%20MagPi%22&sort=-date |
12:40
🔗
|
godane |
i was going to be uploading it but looks like some one bet me to it |
13:35
🔗
|
|
lox has joined #archiveteam-bs |
13:42
🔗
|
|
lox has quit IRC () |
14:58
🔗
|
|
staree has joined #archiveteam-bs |
15:06
🔗
|
|
staree has quit IRC (Quit: Page closed) |
15:32
🔗
|
|
rejon has quit IRC (Ping timeout: 369 seconds) |
16:01
🔗
|
|
primus104 has quit IRC (Leaving.) |
16:05
🔗
|
|
Zebranky has joined #archiveteam-bs |
16:15
🔗
|
|
toad1 has quit IRC (Read error: Operation timed out) |
16:15
🔗
|
|
toad2 has joined #archiveteam-bs |
16:29
🔗
|
|
aaaaaaaaa has joined #archiveteam-bs |
17:07
🔗
|
|
primus104 has joined #archiveteam-bs |
17:19
🔗
|
Start |
off to argentina today |
17:19
🔗
|
Start |
bye |
17:19
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
17:22
🔗
|
yipdw |
Ctrl-S: WARCs are often stored as individually gzipped records, and record content may differ for two identical responses |
17:24
🔗
|
yipdw |
it doesn't rule out deduplication occuring at IA but I don't understand what space benefit they'd get that would outweigh the other costs of dedup |
17:25
🔗
|
yipdw |
at least for the specific case of Web data; dedup in other systems might happen |
17:25
🔗
|
yipdw |
email archive.org and ask I guess |
17:27
🔗
|
yipdw |
I mention the gzip thing because it does provide a useful space savings on its own; in archivebot data we get around a 2:1 compression ratio for WARC:total downloaded |
17:28
🔗
|
yipdw |
it's not stellar but taking an archive from e.g. 700 GB to 350 GB matters |
17:29
🔗
|
yipdw |
of course if all your WARCs are PNGs or JPEGs or similarly incompressible stuff then all that is irrelevant |
19:01
🔗
|
|
mistym has joined #archiveteam-bs |
21:14
🔗
|
SketchCow |
I'll stumble on that in my cleaning, godane. |
21:14
🔗
|
SketchCow |
I'm going to wreck the opensource pile. |
21:15
🔗
|
SketchCow |
I took it from 560,000 to something like 210,000 |
21:27
🔗
|
SketchCow |
*/win 5 |
21:39
🔗
|
|
ete_ has joined #archiveteam-bs |
21:47
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
21:49
🔗
|
joepie91 |
https://imgur.com/gallery/hRf2trV (cc SketchCow) |
21:56
🔗
|
BlueMaxim |
as someone who lives in nsw, australia, pretty sure that's child abuse even by our standards |
22:00
🔗
|
|
mutoso has quit IRC (Read error: Operation timed out) |
22:08
🔗
|
|
schbirid has quit IRC (Leaving) |
22:10
🔗
|
|
mutoso has joined #archiveteam-bs |
23:20
🔗
|
|
Aranje has quit IRC (ny.us.hub irc.paraphysics.net) |
23:20
🔗
|
|
Sue_ has quit IRC (ny.us.hub irc.paraphysics.net) |
23:20
🔗
|
|
dx has quit IRC (ny.us.hub irc.paraphysics.net) |
23:20
🔗
|
|
sep332 has quit IRC (ny.us.hub irc.paraphysics.net) |
23:20
🔗
|
|
ivan`_ has quit IRC (ny.us.hub irc.paraphysics.net) |
23:20
🔗
|
|
phuzion has quit IRC (ny.us.hub irc.paraphysics.net) |
23:20
🔗
|
|
Sellyme_ has quit IRC (ny.us.hub irc.paraphysics.net) |
23:21
🔗
|
|
Aranje has joined #archiveteam-bs |
23:21
🔗
|
|
Sue_ has joined #archiveteam-bs |
23:21
🔗
|
|
dx has joined #archiveteam-bs |
23:21
🔗
|
|
sep332 has joined #archiveteam-bs |
23:21
🔗
|
|
ivan`_ has joined #archiveteam-bs |
23:21
🔗
|
|
phuzion has joined #archiveteam-bs |
23:21
🔗
|
|
Sellyme_ has joined #archiveteam-bs |
23:21
🔗
|
|
irc.paraphysics.net sets mode: +o Sellyme_ |
23:21
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
23:24
🔗
|
|
APerti has joined #archiveteam-bs |
23:48
🔗
|
|
Lord_Nigh has quit IRC (Read error: Operation timed out) |
23:50
🔗
|
|
Lord_Nigh has joined #archiveteam-bs |