#archiveteam-bs 2017-08-19,Sat

↑back Search

Time Nickname Message
01:53 🔗 username1 has joined #archiveteam-bs
01:55 🔗 schbirid2 has quit IRC (Read error: Operation timed out)
01:59 🔗 pizzaiolo has joined #archiveteam-bs
02:28 🔗 j08nY has quit IRC (Remote host closed the connection)
02:31 🔗 Mateon1 has quit IRC (Ping timeout: 268 seconds)
02:32 🔗 Mateon1 has joined #archiveteam-bs
03:21 🔗 Ravenloft has joined #archiveteam-bs
03:23 🔗 pizzaiolo has quit IRC (Remote host closed the connection)
03:26 🔗 bluesoul so, i have my first warc and it is confirmed working and playable with web archive player 1.4.7
03:26 🔗 bluesoul now what?
03:26 🔗 qw3rty111 has joined #archiveteam-bs
03:28 🔗 DFJustin https://archive.org/upload/
03:29 🔗 bluesoul i guess the web uploader has settings to allow for a 1GB+ warc that might take half an hour to upload
03:30 🔗 bluesoul do i throw this cdx file in with it?
03:30 🔗 qw3rty119 has quit IRC (Read error: Operation timed out)
03:32 🔗 fie has quit IRC (Ping timeout: 250 seconds)
03:35 🔗 bluesoul eh, we'll try it
03:42 🔗 DFJustin you don't need to, it will generate one
03:42 🔗 DFJustin shouldn't hurt anything though
03:43 🔗 bluesoul we will find out if it hurt anything eventually
03:43 🔗 bluesoul i suspect nothing will catch fire
03:52 🔗 Asparagir has joined #archiveteam-bs
04:02 🔗 Stilett0 has quit IRC (Read error: Operation timed out)
04:17 🔗 Stilett0 has joined #archiveteam-bs
04:22 🔗 Sk1d has quit IRC (Ping timeout: 250 seconds)
04:22 🔗 TheLovina has quit IRC (Ping timeout: 370 seconds)
04:22 🔗 TheLovina has joined #archiveteam-bs
04:26 🔗 bluesoul https://archive.org/details/dep.ca does this look right?
04:29 🔗 Sk1d has joined #archiveteam-bs
04:32 🔗 DFJustin seems ok
04:33 🔗 DFJustin I'd recommend putting a crawl date on it as well
04:35 🔗 bluesoul would that just be the generic date field? wasn't sure if that was supposed to be the age of the data or the archive
04:45 🔗 DFJustin this is how IA's go in https://archive.org/details/WIDE-20170819012251-crawl802
04:45 🔗 DFJustin I don't think the details matter that much as long as it's basically sensible
04:46 🔗 DFJustin something so you can tell it apart if the same site gets crawled more than once
04:46 🔗 bluesoul ah i see, firstfiledate and lastfiledate
04:47 🔗 DFJustin those are probably internal use fields
04:47 🔗 DFJustin the exact timestamps are stored in the warc itself
04:51 🔗 DFJustin yeah I think it generates those fields when they process it for the wayback machine, here's one I uploaded and I never filled in firstfiledate etc https://archive.org/details/archive.pdp11.org.ru-20130504
04:51 🔗 DFJustin but I put a date in the title and item name for identification when browsing through
04:53 🔗 bluesoul okay, i think i'm starting to understand the logic/structure of things
04:53 🔗 bluesoul i need to recategorize this as a "just in time grab" as the company's bankrupt so i just change the collection name to archiveteam-fire yeah?
04:54 🔗 DFJustin you can't, only IA admins have access to do that
04:58 🔗 DFJustin for now adding archiveteam to the subject keywords would be good
05:00 🔗 DFJustin btw for small jobs like this we have #archivebot which does the crawl and upload automatically if you just give it a base URL
05:00 🔗 bluesoul cool, archivebot was brought up but this was a good learning experience for me as well
05:22 🔗 Asparagir has quit IRC (Asparagir)
05:56 🔗 Honno has joined #archiveteam-bs
06:08 🔗 alfie has quit IRC (Ping timeout: 260 seconds)
06:55 🔗 BlueMaxim has joined #archiveteam-bs
07:08 🔗 Ravenloft has quit IRC (Read error: Connection reset by peer)
07:12 🔗 Honno_ has joined #archiveteam-bs
07:22 🔗 Honno has quit IRC (Read error: Operation timed out)
07:26 🔗 schbirid2 has joined #archiveteam-bs
07:29 🔗 username1 has quit IRC (Read error: Operation timed out)
07:45 🔗 fie has joined #archiveteam-bs
09:26 🔗 pie_ has joined #archiveteam-bs
09:26 🔗 pie_ hi guys, is there a way to get archive to start recursively archiving a website?
09:28 🔗 pie_ also hm, so archive.is isnt a frontend for archive.org?
09:31 🔗 zyphlar has joined #archiveteam-bs
09:31 🔗 Aoede it isn't
09:31 🔗 Aoede for archiving a website see #archivebot
09:31 🔗 schbirid2 has quit IRC (Quit: Leaving)
09:32 🔗 Aoede pie_ ^
09:33 🔗 pie_ thanks
09:57 🔗 schbirid has joined #archiveteam-bs
10:04 🔗 j08nY has joined #archiveteam-bs
10:04 🔗 Lotheric bluesoul: I downloaded the torrent, what do I do with a .warc file ?
10:05 🔗 schbirid cherish it
10:05 🔗 schbirid archive it
10:05 🔗 schbirid love it
10:05 🔗 schbirid hexdump it
10:06 🔗 Lotheric :)
10:19 🔗 Lotheric I got it to work with webarchiveplayer.exe
10:19 🔗 Lotheric sweet! thanks a lot :)
10:19 🔗 Lotheric Seems like it got everything
11:07 🔗 Mateon1 has quit IRC (Remote host closed the connection)
11:11 🔗 BlueMaxim has quit IRC (Read error: Operation timed out)
11:51 🔗 ld1 has quit IRC (Ping timeout: 260 seconds)
11:52 🔗 Mateon1 has joined #archiveteam-bs
12:00 🔗 zyphlar has quit IRC (Quit: Connection closed for inactivity)
13:24 🔗 RichardG has quit IRC (Read error: Connection reset by peer)
13:24 🔗 RichardG has joined #archiveteam-bs
14:00 🔗 odemg SketchCow, replied on reddit
14:06 🔗 SketchCow Sorry, I never assume I know the people, even though I often do.
14:07 🔗 SketchCow Just give me the link when I can grab it, I'll pull them down, make a collection, shove them up.
14:07 🔗 SketchCow I wish someone would write descriptions of them
14:09 🔗 odemg SketchCow, sound, his upload is going steady so I'm not syncing to the dhevel server until it's all done then I'll put it in all the places, post the torrent and nudge you here.
14:11 🔗 odemg At his current speed it should be done in a little over 5 hours.
14:13 🔗 SketchCow Great.
14:13 🔗 SketchCow I'll be out on a date, will be back tonight, we'll have a nice collection
14:16 🔗 odemg Ohh sweet, have fun! I'll be having a bbq myself so late tonight works :D
14:23 🔗 arkiver what is this about?
14:51 🔗 pie_ the fuck https://www.youtube.com/watch?v=CO-NaKJIXPA
14:51 🔗 pie_ fuck wrong channel
15:35 🔗 joepie91 this fella needs some archival directed at them, I think: https://twitter.com/themaddimension
15:35 🔗 joepie91 one of the 'unite the right' people, apparently deleting tweets
16:25 🔗 BnAboyZ66 has quit IRC (Ping timeout: 260 seconds)
16:27 🔗 fie has quit IRC (Ping timeout: 250 seconds)
16:28 🔗 Meroje has quit IRC (Ping timeout: 260 seconds)
16:33 🔗 Meroje has joined #archiveteam-bs
16:37 🔗 fie has joined #archiveteam-bs
16:53 🔗 BartoCH has quit IRC (Ping timeout: 260 seconds)
17:01 🔗 RichardG has quit IRC (Read error: Connection reset by peer)
17:02 🔗 brayden has quit IRC (Read error: Connection reset by peer)
17:02 🔗 brayden_ has joined #archiveteam-bs
17:02 🔗 swebb sets mode: +o brayden_
17:02 🔗 brayden_ is now known as brayden
17:10 🔗 j08nY has quit IRC (Read error: Operation timed out)
17:27 🔗 Pudsey has joined #archiveteam-bs
17:27 🔗 Asparagir has joined #archiveteam-bs
17:32 🔗 Pudsey_ has joined #archiveteam-bs
17:35 🔗 pizzaiolo has joined #archiveteam-bs
17:35 🔗 Pudsey has quit IRC (Ping timeout: 245 seconds)
17:42 🔗 BartoCH has joined #archiveteam-bs
17:49 🔗 odemg has quit IRC (Read error: Operation timed out)
18:12 🔗 Pudsey_ has quit IRC (Remote host closed the connection)
18:15 🔗 RichardG has joined #archiveteam-bs
18:24 🔗 RichardG_ has joined #archiveteam-bs
18:28 🔗 RichardG has quit IRC (Ping timeout: 370 seconds)
18:41 🔗 schbirid2 has joined #archiveteam-bs
18:44 🔗 schbirid has quit IRC (Read error: Operation timed out)
18:51 🔗 Odd0002 has quit IRC (Remote host closed the connection)
18:56 🔗 JAA joepie91: I already archived him five days ago. :-)
18:58 🔗 RichardG_ has quit IRC (Read error: Connection reset by peer)
19:00 🔗 RichardG has joined #archiveteam-bs
19:09 🔗 HCross2 Kenshin: you around?
19:10 🔗 RichardG has quit IRC (Read error: No route to host)
19:11 🔗 RichardG has joined #archiveteam-bs
19:18 🔗 RichardG_ has joined #archiveteam-bs
19:20 🔗 RichardG has quit IRC (Ping timeout: 250 seconds)
19:21 🔗 RichardG_ has quit IRC (Read error: Connection reset by peer)
19:22 🔗 RichardG has joined #archiveteam-bs
19:35 🔗 Asparagir Here are some posts that ArchiveTeam might find interesting, about Twitter archiving and massive data processing of the tweetstreams:
19:35 🔗 Asparagir https://inkdroid.org/2017/08/15/utr/
19:36 🔗 Asparagir and its follow-up post https://inkdroid.org/2017/08/18/delete-forensics/
19:37 🔗 Asparagir It ran off a collection of 165,314 tweets, of which 16,492 (9.9%) were later deleted. Has interesting stats and musings.
19:47 🔗 Odd0002 has joined #archiveteam-bs
20:00 🔗 wp494 has quit IRC (Read error: Connection reset by peer)
20:10 🔗 Odd0002 has quit IRC (Read error: Operation timed out)
20:15 🔗 Odd0002 has joined #archiveteam-bs
20:16 🔗 Lotheric has quit IRC (Leaving)
21:12 🔗 wp494 has joined #archiveteam-bs
21:20 🔗 ZexaronS has joined #archiveteam-bs
21:36 🔗 TheLovina has quit IRC (Ping timeout: 370 seconds)
21:39 🔗 Famicoman has joined #archiveteam-bs
21:41 🔗 Stilett0 has quit IRC (Ping timeout: 250 seconds)
21:55 🔗 godane so i have uploaded 2527 items so far this month
22:01 🔗 Asparagir Awesome!
22:06 🔗 Odd0002 has quit IRC (Remote host closed the connection)
22:12 🔗 Odd0002 has joined #archiveteam-bs
22:28 🔗 Odd0002_ has joined #archiveteam-bs
22:29 🔗 Odd0002 has quit IRC (Read error: Operation timed out)
22:32 🔗 Odd0002_ has quit IRC (Remote host closed the connection)
22:40 🔗 Odd0002 has joined #archiveteam-bs
22:56 🔗 pie_ has quit IRC (Ping timeout: 246 seconds)
23:27 🔗 Asparagir has quit IRC (Read error: Connection reset by peer)
23:29 🔗 Asparagir has joined #archiveteam-bs
23:50 🔗 dashcloud has quit IRC (Read error: Operation timed out)
23:55 🔗 dashcloud has joined #archiveteam-bs
23:55 🔗 Ravenloft has joined #archiveteam-bs
23:55 🔗 Ravenloft http://www.blackfalcongames.net/?p=183

irclogger-viewer