#archiveteam 2015-03-29,Sun

↑back Search

Time Nickname Message
00:00 🔗 serapeum has joined #archiveteam
00:12 🔗 caber has joined #archiveteam
00:13 🔗 cadbury_ has joined #archiveteam
00:14 🔗 c_b2 has joined #archiveteam
00:14 🔗 c_b has quit IRC (Ping timeout: 260 seconds)
00:16 🔗 c_b2 is now known as c_b
00:17 🔗 aNthraXx has joined #archiveteam
00:17 🔗 garyrh https://www.youtube.com/watch?v=1L1EIUIjBfk
00:20 🔗 NovaKing_ has joined #archiveteam
00:24 🔗 mistym_ has quit IRC (Remote host closed the connection)
00:24 🔗 mistym has joined #archiveteam
00:25 🔗 primus has quit IRC (Read error: Operation timed out)
00:25 🔗 joepie91_ [18:52] <pir^2> Is there a process to request adding a WARC to Wayback Machine?
00:25 🔗 joepie91_ the process consists of: talk to SketchCow
00:25 🔗 joepie91_ :P
00:25 🔗 joepie91_ ref https://archive.org/details/pdp10nocrew https://archive.org/details/dmoz-rdf-20150327
00:26 🔗 joepie91_ since pir^2 is apparently gone
00:26 🔗 joepie91_ also cc Kazzy aaaaaaaaa garyrh
00:26 🔗 joepie91_ in case somebody else asks
00:42 🔗 NovaKing_ has quit IRC (Read error: Operation timed out)
00:43 🔗 cadbury_ has quit IRC (Read error: Operation timed out)
00:43 🔗 aNthraXx has quit IRC (Read error: Operation timed out)
00:46 🔗 caber has quit IRC (Read error: Operation timed out)
00:49 🔗 BlueMaxim has quit IRC (Quit: Leaving)
00:51 🔗 BlueMaxim has joined #archiveteam
00:51 🔗 caber has joined #archiveteam
00:51 🔗 NovaKing_ has joined #archiveteam
00:52 🔗 primus104 has quit IRC (Leaving.)
00:53 🔗 cadbury_ has joined #archiveteam
00:56 🔗 aNthraXx has joined #archiveteam
01:17 🔗 SimpBrain has quit IRC (Ping timeout: 512 seconds)
01:25 🔗 pir^2 has joined #archiveteam
01:25 🔗 pir^2 Another one for Wayback Machine - https://archive.org/details/debatesoireachtasie-XML
01:26 🔗 Kazzy pir^2: [00:25:40] <joepie91_> [18:52] <pir^2> Is there a process to request adding a WARC to Wayback Machine?
01:26 🔗 Kazzy [00:25:48] <joepie91_> the process consists of: talk to SketchCow
01:26 🔗 pir^2 I read that on the logs :P
01:26 🔗 Kazzy ah right, didn't know you were reading them :p
01:27 🔗 schbirid2 has quit IRC (Read error: Operation timed out)
01:33 🔗 pir^2 has quit IRC (Ping timeout: 370 seconds)
01:38 🔗 pir^2 has joined #archiveteam
01:39 🔗 schbirid2 has joined #archiveteam
01:39 🔗 pir^2 why has the dmoz one been deriving for hours?
01:40 🔗 pir^2 Kazzy, can you see this? https://archive.org/catalog.php?history=1&identifier=dmoz-rdf-20150327
01:41 🔗 Kazzy i see a task waiting for an admin
01:41 🔗 kyan "FATAL ERROR: no file-level CDXs found in this item; rerun to clear redrow and update files.xml"
01:42 🔗 kyan The derive hasn't started yet; see https://archive.org/catalog.php?whereami=1 to see where you are in the queue
01:45 🔗 wp494 has quit IRC (Read error: No route to host)
01:47 🔗 pir^2 So I need to wait for an admin? How long does that take?
01:55 🔗 pir^2 has quit IRC (Ping timeout: 370 seconds)
02:31 🔗 wp494 has joined #archiveteam
02:33 🔗 SketchCow What up.
02:37 🔗 godane i'm uploading more funny or die videos
02:40 🔗 SketchCow Excellent.
02:42 🔗 godane also westworth military academy yearbooks are uploaded: https://archive.org/search.php?query=creator%3A%22Westworth+Military+Academy%22
02:43 🔗 SketchCow Where did those come from, anyway?
02:45 🔗 godane wma.edu
02:46 🔗 godane i was searching for pri the world and one of there pdfs came up
02:47 🔗 SketchCow Neat
03:00 🔗 godane SketchCow: i would like some help getting the pri the world podcast
03:00 🔗 godane i can't find anything before 2010-06-17
03:01 🔗 godane i'm trying to find the full shows so we can have a collection on IA
03:05 🔗 SoJa has quit IRC (Quit: Page closed)
03:14 🔗 Ymgve has quit IRC ()
03:20 🔗 necenzura has joined #archiveteam
03:21 🔗 necenzura has quit IRC (Client Quit)
03:21 🔗 necenzura has joined #archiveteam
03:25 🔗 necenzura for real, rapidshare archiving?
03:38 🔗 garyrh godane, this has some http://web.archive.org/web/*/http://media.theworld.org/*
03:38 🔗 garyrh ex. http://web.archive.org/web/20120912192455/http://media.theworld.org/audio/010209full.mp3
03:43 🔗 godane i got all of those
03:53 🔗 necenzura has quit IRC (Quit: Page closed)
03:55 🔗 SketchCow I'm adding so many weird cheats to this thing
04:00 🔗 mistym has quit IRC (Remote host closed the connection)
04:04 🔗 dashcloud has quit IRC (Read error: Operation timed out)
04:07 🔗 SketchCow https://ia902502.us.archive.org/22/items/archiveteam_archivebot_go_068/00_coverimage.png
04:09 🔗 johtso some great fonts on those yearbooks https://archive.org/stream/Westworth_Military_Academy_Yearbook_1882/WMA-1882#page/n17/mode/2up
04:11 🔗 dashcloud has joined #archiveteam
04:12 🔗 aaaaaaaaa has quit IRC (Leaving)
04:15 🔗 john has joined #archiveteam
04:16 🔗 john I have a silly question.
04:17 🔗 john How can you mirror the whole of telecomix.org (it's only 12 pages or so) with wget, and download the hotlinked images too without having wget hop onto other sites, and without using -D?
04:24 🔗 mistym has joined #archiveteam
04:37 🔗 chfoo john: you can use wpull and --span-hosts-allow=page-requisites
04:41 🔗 john Does that support the warc format?
04:41 🔗 yipdw john: yeah
04:41 🔗 john Good. Thanks.
04:42 🔗 yipdw john: fwiw wpull is also ArchiveBot's crawler, which has been injecting WARCs into IA's Wayback Machine for about a year now
04:42 🔗 john Neat.
04:43 🔗 john Out of curiosity, is archiveteam the only group allowed to add directly to the wayback machine, or are there others?
04:43 🔗 xmc we are the only one who does
04:44 🔗 john All right.
04:44 🔗 yipdw well, there's internal IA groups
04:48 🔗 john Can third parties have warc archives integrated after some review?
04:49 🔗 john I'm not quite sure how that could work if the archived site is down, since warc files seem like they could be hand-made.
04:55 🔗 SketchCow Archive Team is the only one
04:58 🔗 john webarchiviewer can't find the html files create in warc files created by wpull for some reason. What do you usually use to view or "replay" these?
04:58 🔗 john *created
04:58 🔗 john *webarchiveplayer
04:59 🔗 Start_ has joined #archiveteam
04:59 🔗 Start has quit IRC (Read error: Connection reset by peer)
05:04 🔗 yipdw john: works fine for me
05:04 🔗 yipdw what WARC are you using?
05:04 🔗 john yipdw: The one I just created with wpull.
05:05 🔗 yipdw I need the WARC to verify, otherwise I'm afraid I can't offer anything more substantial than "it works for me"
05:06 🔗 john http://a.pomf.se/fvjica.warc
05:06 🔗 john I generated it with `wpull telecomix.org --warc-file telecomix.org --span-hosts-allow=page-requisites --page-requisites'.
05:06 🔗 brayden has joined #archiveteam
05:07 🔗 yipdw john: try http://localhost:8090/replay/20150329050054/http://telecomix.org
05:08 🔗 john Well, that works.
05:09 🔗 john It's because it doesn't end with html I guess.
05:09 🔗 yipdw I don't know how pywb/webarchiveplayer's listing works, I haven't looked too closely at it
05:09 🔗 john Well, it looks like it's extension based.
05:09 🔗 yipdw but the detection heuristics might just need some work
05:10 🔗 john May I ask what you use? In the future I'd like to see individual files in the archive as well.
05:10 🔗 yipdw john: at least one strategy in webarchiveplayer is MIME-type based, not extension-based
05:10 🔗 john Huh. Weird.
05:10 🔗 yipdw so yeah I don't know why it's not showing up
05:10 🔗 yipdw anyway
05:11 🔗 john So maybe it's the webmaster's fault?
05:11 🔗 c_b has quit IRC (Quit: c_b)
05:11 🔗 yipdw no, the response comes back as text/html
05:11 🔗 john Okay, then I have no idea.
05:12 🔗 john Is there a more a more capable program you can reccommend though?
05:12 🔗 godane i was just thinking of setting up a web archive player for mesh network
05:12 🔗 yipdw john: no
05:12 🔗 yipdw webarchiveplayer is the best one I know of
05:12 🔗 john All right.
05:12 🔗 yipdw in terms of fidelity, it exceeds Wayback
05:12 🔗 john I'll go looking.
05:12 🔗 godane that way people can download archiveteam stuff and use it on a mesh network
05:12 🔗 yipdw if you want a listing, try https://github.com/ArchiveTeam/warc-proxy
05:13 🔗 yipdw however, keep in mind that that is an HTTP proxy server and its use is more involved
05:13 🔗 john Okay.
05:14 🔗 john I can create a firefox profile. No big deal for me.
05:14 🔗 yipdw oh, I think I know what happened
05:14 🔗 yipdw File "build/bdist.linux-x86_64/egg/pywb/utils/canonicalize.py", line 40, in canonicalize
05:14 🔗 yipdw raise UrlCanonicalizeException('Invalid Url: ' + url)
05:14 🔗 yipdw UrlCanonicalizeException: Invalid Url: urn:X-wpull:log
05:15 🔗 yipdw that's an error we've seen before
05:15 🔗 yipdw current pywb builds don't like WARC records with URNs
05:15 🔗 john Okay.
05:18 🔗 yipdw anyway, the author of pywb is aware of wpull and its custom records, so this is likely to be fixed soon
05:21 🔗 yipdw john: you can try the wpull option --no-warc-keep-log to not keep the wpull log in the WARC, which may help
05:21 🔗 john That work-proxy is kind of nice.
05:21 🔗 john *warc-proxy
05:21 🔗 john yipdw: Thanks.
05:22 🔗 yipdw john: another possibility is --warc-move, which might be overkill, but as a side effect it places all wpull metadata records in a -meta.warc.gz file
05:23 🔗 yipdw er
05:23 🔗 yipdw --warc-max-size, not --warc-move
05:23 🔗 john Thanks.
05:23 🔗 john I can see that being used, since I offer to distribute warc archives often.
05:36 🔗 john yipdw: --warc-max-size expects an argument.
05:36 🔗 yipdw yeah, it's the max size
05:37 🔗 john Oh, I see. So just set it to any large number?
05:37 🔗 yipdw whatever number makes sense for your environment
05:37 🔗 yipdw in archivebot we use 5 GiB
05:38 🔗 john What if a site was really larger than 5 GiB? You just make sure it's doing the right thing?
05:39 🔗 yipdw --warc-max-size is a threshold size for each WARC, not the maximum fetch size
05:39 🔗 yipdw see e.g. http://wpull.readthedocs.org/en/master/options.html
05:39 🔗 john Oh, all right.
05:42 🔗 john Out of curiosity, might I be able to get wpull to use my browser cookies?
05:43 🔗 yipdw if you can export them in Mozilla's cookies.txt format, yeah
05:43 🔗 mistym has quit IRC (Remote host closed the connection)
05:43 🔗 yipdw see e.g. --load-cookies
05:45 🔗 john Thanks.
06:25 🔗 primus104 has joined #archiveteam
06:30 🔗 JMC has quit IRC (Read error: Operation timed out)
06:34 🔗 SketchCow Question for the code nerds
06:34 🔗 SketchCow Well, first, john - use archiveteam-bs for something this long.
06:36 🔗 john As long as what?
06:40 🔗 SketchCow As long as this discussion went
06:40 🔗 SketchCow Have the silly questions in #archiveteam-bs
06:44 🔗 john A, okay.
06:44 🔗 john *Ah
06:55 🔗 garyrh Code for the Code Gods.
06:57 🔗 john Okay, now I must know, where is that from? Originally I thought it was an 8chan thing, but apparently it's everywhere now.
07:15 🔗 SketchCow Talk about it in -bs
07:23 🔗 signius has quit IRC (Read error: Operation timed out)
07:36 🔗 signius has joined #archiveteam
09:10 🔗 schbirid2 has quit IRC (Leaving)
09:33 🔗 schbirid has joined #archiveteam
10:22 🔗 scyther has joined #archiveteam
10:32 🔗 SimpBrain has joined #archiveteam
10:38 🔗 Ymgve has joined #archiveteam
10:56 🔗 svchfoo1 has quit IRC (Read error: Connection reset by peer)
10:59 🔗 svchfoo1 has joined #archiveteam
10:59 🔗 svchfoo2 sets mode: +o svchfoo1
11:31 🔗 schbirid https://github.com/cloudfs/ftp-cloudfs
11:43 🔗 will has left Textual IRC Client: www.textualapp.com
11:44 🔗 will has joined #archiveteam
12:00 🔗 scyther has quit IRC (Leaving)
12:27 🔗 primus104 has quit IRC (Leaving.)
12:40 🔗 [Beta] has joined #archiveteam
12:45 🔗 lysobit has quit IRC (Quit: quit)
12:49 🔗 habi has joined #archiveteam
12:49 🔗 habi has left
12:55 🔗 lysobit has joined #archiveteam
13:07 🔗 habi has joined #archiveteam
14:01 🔗 SimpBrain has quit IRC (Ping timeout: 512 seconds)
14:30 🔗 primus104 has joined #archiveteam
14:38 🔗 philpem has joined #archiveteam
14:39 🔗 primus104 has quit IRC (Leaving.)
15:19 🔗 underscor has quit IRC (Ping timeout: 370 seconds)
15:24 🔗 habi has left
15:28 🔗 underscor has joined #archiveteam
15:28 🔗 swebb sets mode: +o underscor
15:28 🔗 primus104 has joined #archiveteam
15:47 🔗 underscor has quit IRC (Ping timeout: 370 seconds)
15:47 🔗 brayden has quit IRC (Ping timeout: 606 seconds)
16:06 🔗 Start_ is now known as Start
16:06 🔗 Start has quit IRC (Disconnected.)
16:06 🔗 Start has joined #archiveteam
16:06 🔗 Start has quit IRC (Remote host closed the connection)
16:06 🔗 Start has joined #archiveteam
16:26 🔗 SketchCow As part of my return to normal life this week, I'm working very hard to kill out all sorts of waiting piles of data on FOS (yes, again). I'll be asking about some of the upload jobs to see which are done.
16:27 🔗 brayden has joined #archiveteam
16:44 🔗 SketchCow Testflight is now getting uploaded.
16:44 🔗 SketchCow https://archive.org/details/archiveteam_testflight&tab=collection
16:53 🔗 SketchCow I've written a script that goes to an archivebot collection and says "get the most complicated webpage grab and make that the cover image".
16:54 🔗 SketchCow It's going to make that collection look sweeeeeeeet
16:55 🔗 underscor has joined #archiveteam
16:55 🔗 swebb sets mode: +o underscor
16:56 🔗 xmc same metric as usual: screenshot every page and look for the largest-by-kb image?
16:57 🔗 xmc nice
16:57 🔗 SketchCow Yes
16:57 🔗 xmc i love how simple that is
16:58 🔗 SketchCow For the purposes of beauty, which is arbitrary, it works well.
16:58 🔗 SketchCow http://teamarchive1.fnf.archive.org/SCREENCHECK/SHOWBOAT/
16:58 🔗 Nemo_bis beauty ~ entropy
16:58 🔗 SketchCow So, that is what it shows me when I say "show me all the screenshots you generated for this item"
16:58 🔗 SketchCow I then have it asking me, in a loop, which need to die.
16:59 🔗 SketchCow So, I'm killing the ones that are blank or awful
17:06 🔗 SketchCow Definitely going to take the machine at LEAST a week to go through all these.
17:08 🔗 khaoohs has joined #archiveteam
17:08 🔗 * Sanqui is gonna make a webpage that consists of random pixels and ArchiveBot it
17:08 🔗 Sanqui beauty confirmed
17:08 🔗 SketchCow Nice, the thumbnailer for collections is running even faster these days. Some of the items already have the improved cover.
17:09 🔗 khaoohs_ has quit IRC (Read error: Operation timed out)
17:09 🔗 Sanqui but seriously, awesome work!
17:09 🔗 khaoohs_ has joined #archiveteam
17:09 🔗 khaoohs has quit IRC (Read error: Connection reset by peer)
17:10 🔗 SketchCow A lot of it is understanding what Brewster wants.
17:10 🔗 khaoohs_ has quit IRC (Read error: No route to host)
17:10 🔗 khaoohs_ has joined #archiveteam
17:10 🔗 SketchCow With the highly visual aspect of v2 coming in, he wants this nice collection to be not just useful but pretty, maybe even exquisite.
17:10 🔗 SketchCow He wouldn't leave 60% of the usable space of a building as an insane church of petabytes if he didn't.
17:11 🔗 SketchCow He'd gut it like a fuckin' fish
17:12 🔗 SketchCow Then again: https://archive.org/details/archiveteam_archivebot_go_20141016100002
17:12 🔗 SketchCow That'll give any sane man a heart attack
17:14 🔗 xmc good thing i'm not
17:20 🔗 mistym has joined #archiveteam
17:23 🔗 aaaaaaaaa has joined #archiveteam
17:29 🔗 khaoohs has joined #archiveteam
17:31 🔗 khaoohs_ has quit IRC (Ping timeout: 370 seconds)
17:31 🔗 khaoohs_ has joined #archiveteam
17:32 🔗 khaoohs has quit IRC (Read error: Connection reset by peer)
17:32 🔗 khaoohs_ has quit IRC (Read error: Connection reset by peer)
17:35 🔗 SketchCow It's (almost) to the point where I will write an automatic "make the nicest one the cover" for the whole collection.
17:37 🔗 schbirid has quit IRC (Leaving)
17:37 🔗 schbirid has joined #archiveteam
17:38 🔗 SketchCow I wrote it.
17:40 🔗 khaoohs has joined #archiveteam
17:42 🔗 SketchCow Oh yeah, look at it go.
17:42 🔗 SketchCow (It found a bunch I thought I'd upgraded, and I had not.)
17:42 🔗 khaoohs has quit IRC (Read error: Connection reset by peer)
17:42 🔗 khaoohs has joined #archiveteam
17:48 🔗 khaoohs has quit IRC (Ping timeout: 370 seconds)
17:54 🔗 hive-mind has quit IRC (Ping timeout: 260 seconds)
17:55 🔗 hive-mind has joined #archiveteam
18:39 🔗 dashcloud has quit IRC (Read error: Connection reset by peer)
18:42 🔗 dashcloud has joined #archiveteam
18:44 🔗 xtr-201 has quit IRC (Read error: Connection reset by peer)
18:49 🔗 aaaaaaaaa has quit IRC (Read error: Operation timed out)
18:49 🔗 scyther has joined #archiveteam
19:12 🔗 xmc ohhhyeah
19:20 🔗 underscor has quit IRC (Ping timeout: 370 seconds)
19:24 🔗 underscor has joined #archiveteam
19:24 🔗 swebb sets mode: +o underscor
19:30 🔗 BlueMaxim has quit IRC (Ping timeout: 512 seconds)
19:31 🔗 BlueMaxim has joined #archiveteam
19:47 🔗 SketchCow https://archive.org/details/archivebot is now at 99% sexy thumbnails (until the next set renders, of course)
19:48 🔗 SN4T14__ has joined #archiveteam
19:50 🔗 aaaaaaaaa has joined #archiveteam
19:51 🔗 lytv has quit IRC (Read error: Operation timed out)
19:51 🔗 lytv has joined #archiveteam
19:55 🔗 SN4T14_ has quit IRC (Ping timeout: 512 seconds)
20:18 🔗 schbirid has quit IRC (Leaving)
20:29 🔗 SketchCow the "add new sexy thumbnails" wasn't QUITE working, now it is.
20:33 🔗 SketchCow Yep, now it's working fine.
20:47 🔗 dashcloud has quit IRC (Read error: Operation timed out)
20:56 🔗 dashcloud has joined #archiveteam
20:58 🔗 khaoohs has joined #archiveteam
21:06 🔗 mistym has quit IRC (Remote host closed the connection)
21:08 🔗 SimpBrain has joined #archiveteam
21:09 🔗 khaoohs has quit IRC (Read error: Connection reset by peer)
21:09 🔗 khaoohs has joined #archiveteam
21:10 🔗 khaoohs_ has joined #archiveteam
21:10 🔗 khaoohs has quit IRC (Read error: Connection reset by peer)
21:11 🔗 khaoohs__ has joined #archiveteam
21:11 🔗 khaoohs_ has quit IRC (Read error: Connection reset by peer)
21:23 🔗 mistym has joined #archiveteam
21:33 🔗 khaoohs_ has joined #archiveteam
21:33 🔗 khaoohs__ has quit IRC (Read error: Connection reset by peer)
21:33 🔗 khaoohs_ has quit IRC (Read error: Connection reset by peer)
21:33 🔗 khaoohs_ has joined #archiveteam
21:34 🔗 khaoohs__ has joined #archiveteam
21:34 🔗 Wolfie has joined #archiveteam
21:34 🔗 khaoohs_ has quit IRC (Read error: Connection reset by peer)
21:35 🔗 khaoohs_ has joined #archiveteam
21:35 🔗 Wolfie You dicks, you made me log in to IRC to ask for a secret word to write something on a wiki so that I can ensure that furry porn persists for the future.
21:35 🔗 SketchCow Yes indeed
21:36 🔗 khaoohs__ has quit IRC (Read error: Connection reset by peer)
21:36 🔗 khaoohs_ has quit IRC (Read error: Connection reset by peer)
21:36 🔗 Wolfie Should I copy paste the ALL CAPS request for secret word in here or is that just a trap to make people laugh at me?
21:36 🔗 SketchCow http://i.huffpost.com/gen/1194885/thumbs/o-CHEERS-LEONARDO-DICAPRIO-570.jpg?5
21:36 🔗 raylee Wolfie, please do
21:36 🔗 SketchCow Yes, seconded
21:36 🔗 Wolfie *sigh8
21:36 🔗 Wolfie FINE.
21:36 🔗 Wolfie WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD
21:36 🔗 raylee Wolfie, yahoosucks
21:36 🔗 SketchCow THY WORD IS "YAHOOSUCKS"
21:37 🔗 SketchCow GO FORTH AND SAVE THAT WHICH NEEDS SAVING
21:37 🔗 Wolfie (hurr hurr hurr everybody laughs)
21:37 🔗 raylee loltiming
21:37 🔗 SketchCow Spoiler, furry porn is going to persist.
21:37 🔗 SketchCow We have.... I think 4 people who specialize on it here
21:39 🔗 Wolfie I am well aware that furry porn is gonna persist, I just want to make sure that nobody gets their IP banhammered by some dude who likes inflation porn when they try to mirror it.
21:39 🔗 SketchCow You're our kind of savior.
21:39 🔗 SketchCow Cranky as fuck, but still doing it.
21:39 🔗 SketchCow Welcome.
21:39 🔗 SketchCow <--- jason scott
21:39 🔗 Wolfie Hi Jason.
21:44 🔗 Wolfie I'm gonna go out on a limb here and ping chfoo to ask him about what he wants to do with Furaffinity.
21:49 🔗 SketchCow We've been bonking Furraffinity for.... a while
21:50 🔗 Wolfie I bonked furaffinity personally when they disabled uploads for a full week and left a bizarre orc-creature in the nude on the front page. I got a few hundred thousand submissions before their hiatus ended and they IP banned me.
21:51 🔗 Wolfie Hence... wolfie@<hostname>:~$ curl www.furaffinity.net
21:51 🔗 Wolfie Your IP address has been banned.
21:51 🔗 Wolfie Reason: Mass scripted downloading of submissions.
21:51 🔗 arkiver So for last.fm
21:52 🔗 arkiver from what SketchCow wrote I understand that we need to save:
21:52 🔗 arkiver - the forums
21:52 🔗 arkiver - journals/user profiles
21:52 🔗 arkiver - blog
21:53 🔗 arkiver is that right? or do we have more?
21:59 🔗 kyan arkiver: fwiw, there's a list of things on Last.fm at the wiki page: http://archiveteam.org/index.php?title=Last.fm
22:00 🔗 scyther has quit IRC (Read error: Connection reset by peer)
22:16 🔗 dashcloud has quit IRC (Read error: Operation timed out)
22:16 🔗 philpem has quit IRC (Ping timeout: 260 seconds)
22:19 🔗 dashcloud has joined #archiveteam
22:29 🔗 Start arkiver: we should start a discovery project for friendfeed
22:29 🔗 Start we could use the search like we're doing with rapidshare
22:30 🔗 Start http://friendfeed.com/search?q=QUERY
22:30 🔗 arkiver the archivebot run didn't get it all?
22:31 🔗 Start it's currently blocked. besides, a warrior project will be way faster
22:31 🔗 Start for groups: http://friendfeed.com/groups/search?q=QUERY
22:31 🔗 yipdw yeah now you can get multiple IPs banned instead of one
22:31 🔗 yipdw much faster
22:32 🔗 Sanqui archivebot really needs some way to resolve bans and move jobs to a different pipeline
22:32 🔗 Sanqui though that is a lot of work
22:32 🔗 Sanqui I'm thinking !move <job> <pipeline>, and !requeue <job> <status code>
22:32 🔗 yipdw I'm thinking people should just deal with it for now
22:33 🔗 Sanqui no, I mean, I am dealing with it, archivebot is awesome already
22:33 🔗 Sanqui just it would be nice to have that one day :p
22:37 🔗 yipdw a more serious response is that you can take the collected URLs so far and load those into a warrior project
22:37 🔗 yipdw there's ~4 million or so URLs in the wpull database for that job
22:37 🔗 arkiver Start: #lastchance.fm
22:37 🔗 SketchCow I was sent a note from a last.fm insider
22:37 🔗 yipdw that should keep things busy until April 9
22:37 🔗 SketchCow We should really be moving on this.
22:38 🔗 arkiver chfoo: can you please add lastfm-grab to projects.json?
22:38 🔗 arkiver SketchCow: we'll get it
23:11 🔗 joepie91_ whoa
23:11 🔗 joepie91_ last.fm, what happen?
23:13 🔗 Kazzy new codebase, possible data loss
23:24 🔗 mistym has quit IRC (Remote host closed the connection)
23:24 🔗 mistym has joined #archiveteam
23:24 🔗 mistym has quit IRC (Remote host closed the connection)
23:27 🔗 Start when are twitpic and halo resuming?
23:33 🔗 DFJustin there's more than that on last.fm, every artist and track page has user comments
23:37 🔗 mistym has joined #archiveteam
23:38 🔗 primus104 has quit IRC (Leaving.)
23:55 🔗 Wolfie has quit IRC (Quit: Leaving.)

irclogger-viewer