#archiveteam 2016-09-04,Sun

↑back Search

Time Nickname Message
00:04 πŸ”— JesseW luckcolor: seems sensible to me
00:07 πŸ”— dashcloud has joined #archiveteam
00:08 πŸ”— SketchCow http://fos.textfiles.com/pipeline.html
00:08 πŸ”— SketchCow As per Pipeline, we've begun moving nunij and area51 in (it says no mover script, that's not true)
00:08 πŸ”— SketchCow It's a function that I'm not using
00:19 πŸ”— JesseW hm, on http://fos.textfiles.com/ARCHIVETEAM/ I see various items just labeled as "archiveteam", like: archiveteam_20160902230555 which (according to the idx file) seems to consist of a bunch of torrents. Those are area51, I presume?
00:19 πŸ”— hictooth has quit IRC (Ping timeout: 268 seconds)
00:23 πŸ”— godane1 has quit IRC (Quit: Leaving.)
00:23 πŸ”— JesseW ah archiveteam_20160902230555 has a title that identifies it as part of the "Torrent Time Capsule"
00:24 πŸ”— JesseW others of the unlabeled (by identifier) ones are part of BayImg. One is the grab of thomas.congress.gov: archiveteam_201607050000
00:25 πŸ”— JesseW and another Friends Reunited: archiveteam_2016062210391011
00:25 πŸ”— JesseW which I think has a page on the wiki; I should add a link
00:25 πŸ”— SketchCow Thanks, detective
00:26 πŸ”— JesseW you knew I would
00:26 πŸ”— JesseW (and there already is a link)
00:27 πŸ”— SketchCow There's what you can do and what you should do
00:27 πŸ”— JesseW eh, this one *seemed* to fit both categories. If not, I'm happy to know otherwise
00:30 πŸ”— WinterFox has joined #archiveteam
00:35 πŸ”— pfallenop has quit IRC (Read error: Operation timed out)
00:43 πŸ”— arkiver Update on tumblr and flickr projects. I have now uploaded an original and deduplicate WARC here https://archive.org/download/flickrestdeduphmsdfofjdsd
00:43 πŸ”— arkiver I asked in #warrior for help to see if these are correct.
00:43 πŸ”— arkiver I have also asked the wayback team at Internet Archive if they can have a look at these two WARCs.
00:44 πŸ”— arkiver If they are confirmed to be good, this deduplication script will be used in the tumblr and flickr projects.
00:44 πŸ”— arkiver xmc, PurpleSym ^
00:49 πŸ”— kristian_ has joined #archiveteam
01:08 πŸ”— arkiver for anyone running googlecode
01:08 πŸ”— arkiver do your items also only get 503 and then abort?
01:16 πŸ”— maelstrom has joined #archiveteam
01:19 πŸ”— arkiver Please let me know as soon as possible why your items are aborting with googlecode
01:23 πŸ”— JesseW Mine are just ratelimited.
01:26 πŸ”— arkiver hmm
01:26 πŸ”— arkiver well let me know if you do get any please
01:26 πŸ”— * arkiver is afk for the night
01:26 πŸ”— JesseW will do
01:26 πŸ”— arkiver thanks!
01:27 πŸ”— arkiver if someone can confirm the 503's, I'll send a mail
01:33 πŸ”— JesseW arkiver: Ah, I got a 503!
01:34 πŸ”— Brah has joined #archiveteam
01:34 πŸ”— JesseW but I lost the logs :-(
01:34 πŸ”— Brah has quit IRC (Client Quit)
01:43 πŸ”— JesseW arkiver: got the 503's: http://paste.nerds.io/sokikubejo.js
01:43 πŸ”— JesseW please send the email
01:53 πŸ”— VADemon has quit IRC (Quit: left4dead)
02:08 πŸ”— ndiddy has quit IRC (Ping timeout: 632 seconds)
02:19 πŸ”— kristian_ has quit IRC (Quit: Leaving)
02:19 πŸ”— joepie91 arkiver: JesseW: is that not the bot block page?
02:24 πŸ”— JesseW i think so, yes
02:24 πŸ”— JesseW that's why arkiver is going to write an email
02:28 πŸ”— kristian_ has joined #archiveteam
02:48 πŸ”— kristian_ has quit IRC (Quit: Leaving)
03:40 πŸ”— BlueMaxim has quit IRC (Read error: Operation timed out)
03:43 πŸ”— pfallenop has joined #archiveteam
04:04 πŸ”— Sk1d has quit IRC (Ping timeout: 194 seconds)
04:08 πŸ”— signius has quit IRC (Read error: Operation timed out)
04:10 πŸ”— Sk1d has joined #archiveteam
04:24 πŸ”— signius has joined #archiveteam
04:28 πŸ”— Aranje has quit IRC (Ping timeout: 260 seconds)
04:48 πŸ”— maelstrom has quit IRC (Quit: Leaving)
05:08 πŸ”— godane has joined #archiveteam
06:08 πŸ”— xmc arkiver: neat. is it better to shovel around 10x more data than we are going to wind up with, or to figure out how to not fetch so much in the first place?
06:29 πŸ”— Honno has joined #archiveteam
06:57 πŸ”— BlueMaxim has joined #archiveteam
07:12 πŸ”— DFJustin has quit IRC (Remote host closed the connection)
07:12 πŸ”— DFJustin has joined #archiveteam
07:12 πŸ”— swebb sets mode: +o DFJustin
07:40 πŸ”— PurpleSym arkiver: warcat says β€œBad payload digest.” for revisit and warcinfo records in the deduplicated WARC.
07:42 πŸ”— JesseW has quit IRC (Ping timeout: 370 seconds)
07:48 πŸ”— Simpbrain has quit IRC (Read error: Operation timed out)
07:53 πŸ”— ravetcofx has quit IRC (Ping timeout: 501 seconds)
08:03 πŸ”— metal_cam has joined #archiveteam
08:30 πŸ”— vOYtEC has joined #archiveteam
08:36 πŸ”— schbirid has joined #archiveteam
08:39 πŸ”— Simpbrain has joined #archiveteam
09:47 πŸ”— tuankiet has joined #archiveteam
10:05 πŸ”— atomotic has joined #archiveteam
10:17 πŸ”— atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
10:20 πŸ”— kristian_ has joined #archiveteam
10:59 πŸ”— REiN^ has joined #archiveteam
11:03 πŸ”— bRick5772 has joined #archiveteam
11:20 πŸ”— VADemon has joined #archiveteam
11:30 πŸ”— bRick5772 has quit IRC (Quit: Leaving.)
11:51 πŸ”— Morbus has quit IRC (http://www.disobey.com/)
11:56 πŸ”— Morbus has joined #archiveteam
11:56 πŸ”— kristian_ has quit IRC (Quit: Leaving)
12:13 πŸ”— signius has quit IRC (Read error: Operation timed out)
12:24 πŸ”— arkiver joepie91: yeah, but we're not crawling google code too fast, so I don't think it is caused by that
12:24 πŸ”— arkiver JesseW: mail sent/
12:24 πŸ”— arkiver .*
12:27 πŸ”— arkiver xmc: the HTTP headers returned by flickr on the images are not the for different URLs with the same pauload (same image on c1 and c2 for example).
12:29 πŸ”— arkiver So if we would generate the record of these other image URLs instead of crawling them we'd have to fake or loose some of the headers
12:29 πŸ”— arkiver What do you think?
12:32 πŸ”— arkiver PurpleSym: I think this is caused by warcat not keeping revisit records in mind when checking the payload digest https://github.com/chfoo/warcat/blob/master/warcat/tool.py#L295-L300 and https://github.com/chfoo/warcat/blob/master/warcat/verify.py#L56-L67
12:32 πŸ”— arkiver I'm not 100% sure though
12:32 πŸ”— arkiver The payload digest of the revisit records is the same as the payload digest of the record with the original data, where the revisit record is pointing too
12:33 πŸ”— PurpleSym You’re right, that’s a bug.
12:38 πŸ”— arkiver so
12:38 πŸ”— arkiver anything on the bioware forums?
12:41 πŸ”— WinterFox has quit IRC (Read error: Operation timed out)
12:45 πŸ”— arkiver I see it's in archivebot, nvm
12:49 πŸ”— Medowar pipeline status page borked? http://fos.textfiles.com/pipeline.html
12:53 πŸ”— Simpbrain has quit IRC (Read error: Connection reset by peer)
13:15 πŸ”— arkiver hmm, let me retype <arkiver>xmc: the HTTP headers returned by flickr on the images are not the for different URLs with the same pauload (same image on c1 and c2 for example).
13:16 πŸ”— arkiver xmc: the HTTP headers returned by flickr for the images are not the same for different URLs with the same payload (same image on c1 and c2 for example).
13:37 πŸ”— phuzion has quit IRC (Read error: Operation timed out)
13:40 πŸ”— phuzion has joined #archiveteam
14:22 πŸ”— BlueMaxim has quit IRC (Quit: Leaving)
14:25 πŸ”— VADemon has quit IRC (Quit: left4dead)
14:41 πŸ”— polm has quit IRC (Quit: leaving)
14:48 πŸ”— signius has joined #archiveteam
15:34 πŸ”— _hyperion has joined #archiveteam
15:38 πŸ”— _hyperion is now known as arkiver2
15:39 πŸ”— arkiver2 has quit IRC (Quit: BitchX: try our Windows Me and Windows XP flavors too!)
16:02 πŸ”— MMovie2 has joined #archiveteam
16:03 πŸ”— MMovie has quit IRC (Read error: Operation timed out)
16:21 πŸ”— ravetcofx has joined #archiveteam
16:41 πŸ”— JesseW has joined #archiveteam
16:44 πŸ”— VADemon has joined #archiveteam
17:06 πŸ”— metalcamp has joined #archiveteam
17:10 πŸ”— metal_cam has quit IRC (Read error: Operation timed out)
17:11 πŸ”— metal_cam has joined #archiveteam
17:16 πŸ”— metalcamp has quit IRC (Read error: Operation timed out)
17:25 πŸ”— Infreq has quit IRC (Read error: Operation timed out)
17:29 πŸ”— Infreq has joined #archiveteam
18:37 πŸ”— metalcamp has joined #archiveteam
18:42 πŸ”— metal_cam has quit IRC (Read error: Operation timed out)
19:23 πŸ”— RichardG has quit IRC (Read error: Connection reset by peer)
19:24 πŸ”— RichardG has joined #archiveteam
19:25 πŸ”— metalcamp has quit IRC (Read error: Operation timed out)
19:43 πŸ”— ndiddy has joined #archiveteam
19:54 πŸ”— JesseW has quit IRC (Ping timeout: 370 seconds)
20:09 πŸ”— RichardG has quit IRC (Read error: Operation timed out)
20:10 πŸ”— RichardG has joined #archiveteam
20:21 πŸ”— daank has joined #archiveteam
20:21 πŸ”— daank has quit IRC (Client Quit)
20:34 πŸ”— metalcamp has joined #archiveteam
20:48 πŸ”— Aranje has joined #archiveteam
20:52 πŸ”— schbirid has quit IRC (Quit: Leaving)
20:58 πŸ”— ravetcofx has quit IRC (Read error: Operation timed out)
21:04 πŸ”— metalcamp has quit IRC (Read error: Operation timed out)
21:21 πŸ”— vOYtEC has quit IRC (Ping timeout: 633 seconds)
21:46 πŸ”— all_ has joined #archiveteam
21:47 πŸ”— all_ has quit IRC (Client Quit)
22:15 πŸ”— RichardG has quit IRC (Read error: Operation timed out)
22:16 πŸ”— RichardG has joined #archiveteam
22:19 πŸ”— WinterFox has joined #archiveteam
22:24 πŸ”— VADemon has quit IRC (Quit: left4dead)
22:45 πŸ”— WinterFox has quit IRC (Read error: Operation timed out)
23:12 πŸ”— JesseW has joined #archiveteam
23:14 πŸ”— Honno has quit IRC (Read error: Operation timed out)
23:33 πŸ”— arkiver2_ has joined #archiveteam
23:33 πŸ”— arkiver2_ has quit IRC (Client Quit)
23:51 πŸ”— xmc ah, huh
23:51 πŸ”— xmc ok
23:54 πŸ”— verizon has joined #archiveteam
23:56 πŸ”— verizon hello =)
23:56 πŸ”— xmc sorry, i don't like verizon
23:56 πŸ”— verizon me neither
23:57 πŸ”— verizon but who is the less worse
23:57 πŸ”— verizon that is the question
23:57 πŸ”— verizon so... any ideas

irclogger-viewer