#archiveteam-bs 2016-02-12,Fri

↑back Search

Time Nickname Message
00:03 πŸ”— mutoso has joined #archiveteam-bs
00:11 πŸ”— kyan has joined #archiveteam-bs
00:37 πŸ”— RichardG has quit IRC (Ping timeout: 250 seconds)
00:39 πŸ”— godane so we are down to 160tb of free space on IA
00:39 πŸ”— godane ?
01:01 πŸ”— wyatt8740 has quit IRC (Read error: Operation timed out)
01:02 πŸ”— wyatt8740 has joined #archiveteam-bs
01:17 πŸ”— SketchCow Possibly
01:18 πŸ”— godane i'm grabbing the sound cloud stuff from theblaze
01:19 πŸ”— JW_work21 has quit IRC (Read error: Operation timed out)
01:25 πŸ”— slyphic|a is now known as slyphic
01:38 πŸ”— RichardG has joined #archiveteam-bs
01:46 πŸ”— superkuh has quit IRC (Ping timeout: 252 seconds)
01:50 πŸ”— RichardG has quit IRC (Ping timeout: 633 seconds)
01:52 πŸ”— RichardG has joined #archiveteam-bs
01:52 πŸ”— JesseW has joined #archiveteam-bs
02:12 πŸ”— brayden_ has joined #archiveteam-bs
02:12 πŸ”— swebb sets mode: +o brayden_
02:15 πŸ”— vitzli has joined #archiveteam-bs
02:17 πŸ”— brayden has quit IRC (Read error: Operation timed out)
03:10 πŸ”— snape has joined #archiveteam-bs
04:19 πŸ”— kyan has quit IRC (Leaving)
04:27 πŸ”— kyan has joined #archiveteam-bs
04:41 πŸ”— Microguru has quit IRC (Read error: Connection reset by peer)
04:54 πŸ”— kyan has quit IRC (Leaving)
04:59 πŸ”— superkuh has joined #archiveteam-bs
05:06 πŸ”— JetBalsa has quit IRC (Read error: Connection reset by peer)
05:11 πŸ”— Infreq_ has quit IRC (Ping timeout: 252 seconds)
05:12 πŸ”— Infreq has joined #archiveteam-bs
05:24 πŸ”— acridAxid has quit IRC (Quit: marauder)
05:26 πŸ”— acridAxid has joined #archiveteam-bs
05:40 πŸ”— Sk1d has quit IRC (Ping timeout: 200 seconds)
05:46 πŸ”— Sk1d has joined #archiveteam-bs
05:52 πŸ”— JW_work2 has joined #archiveteam-bs
05:52 πŸ”— vitzli has quit IRC (Leaving)
05:55 πŸ”— JW_work2 has quit IRC (Client Quit)
06:31 πŸ”— nickname_ has quit IRC (Ping timeout: 300 seconds)
06:40 πŸ”— GLaDOS has quit IRC (Read error: Operation timed out)
06:41 πŸ”— GLaDOS has joined #archiveteam-bs
07:27 πŸ”— vitzli has joined #archiveteam-bs
07:39 πŸ”— yipdw has quit IRC (Ping timeout: 633 seconds)
07:57 πŸ”— JesseW has quit IRC (Quit: Leaving.)
08:06 πŸ”— yipdw has joined #archiveteam-bs
08:12 πŸ”— yipdw has quit IRC (Ping timeout: 246 seconds)
08:19 πŸ”— JesseW has joined #archiveteam-bs
08:43 πŸ”— RichardG has quit IRC (Ping timeout: 250 seconds)
09:01 πŸ”— schbirid has joined #archiveteam-bs
09:01 πŸ”— yipdw has joined #archiveteam-bs
09:18 πŸ”— JesseW TIL that some of IA's items lack a listed "uploader", e.g. https://archive.org/details/CaseofSp1940 (from the prelinger collection, uploaded in 2002 apparently)
09:25 πŸ”— vitzli some items has "is_darked": True and no any metadata regarding files and checksums
09:26 πŸ”— vitzli I think I saw the one with no uploader too
09:49 πŸ”— JesseW Hm, I thought most with is_darked gave nothing for metadata.
09:49 πŸ”— JesseW vitzli: example?
09:51 πŸ”— JesseW https://github.com/jjjake/internetarchive/pull/123 <- neat little function I just wrote to display likely spam
10:01 πŸ”— vitzli JesseW, only 4 or 5 in IA.BAK collections, one example would be harvardclassics02elio
10:02 πŸ”— vitzli oops, correct key "is_dark"
10:03 πŸ”— vitzli another is Commodore_MicroComputer_Issue_37_1985_Sep_Oct
10:04 πŸ”— JesseW vitzli: https://archive.org/metadata/harvardclassics02elio looks like a normal item -- no is_dark visible
10:04 πŸ”— JesseW same with https://archive.org/metadata/Commodore_MicroComputer_Issue_37_1985_Sep_Oct
10:05 πŸ”— vitzli hm
10:06 πŸ”— JesseW what data are you seeing it in?
10:06 πŸ”— JesseW the IA.BAK copies?
10:07 πŸ”— vitzli no, the census on IA.BAK items I did
10:08 πŸ”— JesseW hm, that's very odd
10:09 πŸ”— vitzli it's fine now
10:09 πŸ”— JesseW vitzli: pastebin your census entry for one of the ones above?
10:09 πŸ”— vitzli https://paste.ee/p/cKZFK
10:10 πŸ”— JesseW ah, yeah, I have seen that too
10:11 πŸ”— JesseW e.g. https://archive.org/metadata/0002Mistakes
10:12 πŸ”— JesseW but my list of darked ones doesn't include either of those
10:13 πŸ”— JesseW vitzli: when is that paste from?
10:13 πŸ”— JesseW i.e. when did you run your census?
10:14 πŸ”— vitzli 2016-02-03
10:14 πŸ”— JesseW that's bizzare, because https://archive.org/history/Commodore_MicroComputer_Issue_37_1985_Sep_Oct shows no changes since 2015-08
10:16 πŸ”— JesseW the created and uniq values are also different
10:17 πŸ”— vitzli maybe something went really wrong
10:19 πŸ”— JesseW yeah -- also, just as I was watching, a 2nd copy showed up
10:19 πŸ”— JesseW your data shows it only on ia902600 -- and that's what I saw a few minutes ago, but now it is also on ia802600
10:21 πŸ”— vitzli no is_dark items on internetarchivebooks collection
10:21 πŸ”— vitzli I'll drop my result and redo the census
10:21 πŸ”— JesseW vitzli: also, the created value in your data is from the same day you did it.
10:21 πŸ”— JesseW Feb 3
10:21 πŸ”— JesseW (the created value is in unix epoch)
10:22 πŸ”— JesseW how many identifiers are you doing in your census?
10:22 πŸ”— JesseW BTW, I've uploaded some of mine to FOS -- it should get up to IA pretty soon now.
10:23 πŸ”— vitzli 142462 non-unique identifiers
10:23 πŸ”— vitzli 106054 uniques
10:23 πŸ”— JesseW I thought you were just doing the IA.BAK stuff -- that's much smaller.
10:23 πŸ”— vitzli I was going to redo census anyway, because of duplicates/missing ids
10:24 πŸ”— vitzli all items in all collections listed on iabak page
10:24 πŸ”— JesseW That's still much less than 10 million identifiers, I think.
10:24 πŸ”— vitzli I don't understand about 10 million part
10:25 πŸ”— JesseW Ah, that was a mistake by my eyes. I misread 106,054 as 10,060,054 or something like that.
10:26 πŸ”— JesseW Yeah, about a hundred thousand sounds about right for IA.BAK collections.
10:27 πŸ”— JesseW The total number generated by jake back in March 2015 was 14,926,080.
10:27 πŸ”— JesseW (with one duplicate, I think)
10:29 πŸ”— vitzli so 1% then, not bad considering it took less than an hour
10:30 πŸ”— JesseW wow, that is quick
10:31 πŸ”— JesseW If you have the space, you might consider downloading the old (or my new, once it gets there) census data and extracting identifier lists from that, as it does include collections information
10:32 πŸ”— vitzli that would be the next step or one more step away, I still need to fix duplicates in my data
10:32 πŸ”— JesseW in any case, I should sleep
10:32 πŸ”— JesseW 2:30 AM where I am...
10:32 πŸ”— vitzli good night, it's 16:32 here
10:33 πŸ”— JesseW ah, just late afternoon
10:35 πŸ”— JesseW has quit IRC (Quit: Leaving.)
11:35 πŸ”— dan- has quit IRC (Read error: Operation timed out)
11:49 πŸ”— oldcad has joined #archiveteam-bs
12:10 πŸ”— vitzli JesseW, could you run ia-mine on sahfgtb2007-10-31.sbd.flac item, please? I get empty string with 'ia-mine', but 'ia metadata' return json
13:03 πŸ”— dan- has joined #archiveteam-bs
13:35 πŸ”— nickname_ has joined #archiveteam-bs
13:42 πŸ”— Stilett0 has joined #archiveteam-bs
13:44 πŸ”— Stiletto has quit IRC (Read error: Operation timed out)
14:00 πŸ”— RichardG has joined #archiveteam-bs
14:10 πŸ”— GLaDOS has quit IRC (Ping timeout: 260 seconds)
14:10 πŸ”— GLaDOS has joined #archiveteam-bs
14:35 πŸ”— GLaDOS has quit IRC (Read error: Operation timed out)
14:37 πŸ”— GLaDOS has joined #archiveteam-bs
14:54 πŸ”— zyphlar_ has quit IRC (Quit: Connection closed for inactivity)
14:59 πŸ”— Start has quit IRC (Quit: Disconnected.)
15:48 πŸ”— Start has joined #archiveteam-bs
16:56 πŸ”— vitzli has quit IRC (Leaving)
17:01 πŸ”— JesseW has joined #archiveteam-bs
17:04 πŸ”— VADemon has joined #archiveteam-bs
17:07 πŸ”— Start has quit IRC (Quit: Disconnected.)
17:09 πŸ”— JesseW has quit IRC (Quit: Leaving.)
17:09 πŸ”— JW_work2 has joined #archiveteam-bs
17:20 πŸ”— JW_work2 has quit IRC (Leaving.)
17:35 πŸ”— godane SketchCow: i'm up to 2008-03-31 with kpfa
17:36 πŸ”— SketchCow Fantastic
17:37 πŸ”— JW_work2 has joined #archiveteam-bs
17:38 πŸ”— JesseW has joined #archiveteam-bs
17:40 πŸ”— JW_work2 has quit IRC (Client Quit)
17:42 πŸ”— schbirid attic init /mnt/backup/erotica
17:57 πŸ”— JesseW has quit IRC (Quit: Leaving.)
17:57 πŸ”— slyphic is now known as slyphic|a
18:12 πŸ”— SmileyG haha abusing youtube as storage sounds fun D:
18:12 πŸ”— SmileyG antomatic: hmmm drat, true.
18:12 πŸ”— ersi put a cloud in a cloud in a cloud
18:12 πŸ”— SmileyG but contentid stuff is generally _already_ at some somewhere
18:13 πŸ”— SmileyG at least somewhere*
18:13 πŸ”— weles has joined #archiveteam-bs
18:13 πŸ”— antomatic problem is there's no way to tell - so if you upload 30,000 files and more than 3 of them get flagged/taken down, they're all lost
18:14 πŸ”— antomatic unless you distribute it across lots and lots of channels
18:14 πŸ”— SmileyG oh right
18:14 πŸ”— SmileyG hmmm
18:14 πŸ”— antomatic or unless everything is encoded in such a way that the representation is not, in itself, a copyvio
18:14 πŸ”— SmileyG cept I thought I had more than 3
18:14 πŸ”— SmileyG because lots of gaming stuff gets flagged
18:14 πŸ”— arkiver convert bits to dots, add as frames for a videos
18:14 πŸ”— antomatic not that it'd stop google if they thought that people were abusing their generous free peanut offer
18:15 πŸ”— arkiver and add some repair stuff in case a video is deleted
18:15 πŸ”— antomatic actually contentID flags are OK, it's takedowns that count for the '3 strikes' purposes.
18:15 πŸ”— SmileyG Normally mine just says 'this content is not allowed in some countries or something:/
18:15 πŸ”— SmileyG Dear google, please archive soundcloud, kthx
18:15 πŸ”— SmileyG Ah ok nope, no takedowns on me
18:26 πŸ”— PurpleSym SketchCow: Can you have a look at https://archive.org/details/microsoft_word_5.5_german before I upload more DOS software? And yes, I fucked up mediatype. Can this be fixed?
18:27 πŸ”— SketchCow Fixed it.
18:27 πŸ”— SketchCow Other than mediatype, it worked.
18:31 πŸ”— PurpleSym Thanks.
18:31 πŸ”— SketchCow Screenshots displaying under it are this tricky stupid business.
18:31 πŸ”— SketchCow basically, make the screenshot a GIF and name it <item name>.gif
18:32 πŸ”— SketchCow Upload that, it'll be under it.
18:32 πŸ”— SketchCow Or I can run my screenshotter on it.
18:32 πŸ”— PurpleSym I’ll convert the .png
18:35 πŸ”— PurpleSym Hm, Word itself does not seem to work though. Stuck at the copyright dialog for me.
18:45 πŸ”— Stilett0 has quit IRC (Read error: Operation timed out)
18:48 πŸ”— Start has joined #archiveteam-bs
18:50 πŸ”— beardicus has quit IRC (bye)
19:00 πŸ”— SketchCow Really? I got right in.
19:00 πŸ”— mismatch has joined #archiveteam-bs
19:01 πŸ”— phuzion has quit IRC (Remote host closed the connection)
19:02 πŸ”— phuzion has joined #archiveteam-bs
19:02 πŸ”— Start has quit IRC (Quit: Disconnected.)
19:18 πŸ”— PurpleSym Firefox 38 here.
19:20 πŸ”— PurpleSym Nope, mouse not working, fullscreen blanks the window.
19:22 πŸ”— HCross PurpleSym, crashed Opera
19:24 πŸ”— MrRadar Works on Chrome 48 and Firefox 44 for me
19:25 πŸ”— PurpleSym Ok, thanks for testing.
19:25 πŸ”— SimpBrain works chrome 47 linux
19:25 πŸ”— Start has joined #archiveteam-bs
19:36 πŸ”— RichardG has quit IRC (Ping timeout: 492 seconds)
19:46 πŸ”— Stiletto has joined #archiveteam-bs
20:02 πŸ”— RichardG has joined #archiveteam-bs
20:37 πŸ”— slyphic|a is now known as slyphic
20:43 πŸ”— atlogbot has joined #archiveteam-bs
20:45 πŸ”— Start has quit IRC (Quit: Disconnected.)
20:48 πŸ”— signius has quit IRC (Read error: Operation timed out)
20:59 πŸ”— signius has joined #archiveteam-bs
20:59 πŸ”— weles has quit IRC (Read error: Operation timed out)
21:06 πŸ”— mismatch has quit IRC (Ping timeout: 260 seconds)
21:38 πŸ”— bzc6p has joined #archiveteam-bs
21:38 πŸ”— swebb sets mode: +o bzc6p
21:47 πŸ”— godane SketchCow: i'm grabbing metadata that could be used to grab pacifica Radio Archives
21:49 πŸ”— bzc6p So the other day I've written my first "please let us archive content of your site before it possibly gets all deleted", and I could experience being upset for being refused.
21:49 πŸ”— bzc6p Especially cute that the reply was doubly ambiguous thanks to my beautiful language.
21:51 πŸ”— SimpBrain heh dialogue is the first step towards something good
21:51 πŸ”— bzc6p The transcript would be like "we're (unable|unwilling) to [let you] archive our site", so it may have four meanings, but basically all say GTFO.
21:51 πŸ”— SimpBrain im in discussion with the friendsreunited guy, currently going good
21:51 πŸ”— bzc6p Being polite enough, however, to say "thank you for the suggestion" and "we ask for your understanding".
21:51 πŸ”— SimpBrain got a long email to create tomorrow with ideas and such
21:52 πŸ”— bzc6p Good luck for that.
21:52 πŸ”— SimpBrain yeah a helpful tone goes a long way
21:53 πŸ”— bzc6p I wrote a very kind and long letter too. But what to do if it doesn't match their business policy.
21:54 πŸ”— SimpBrain you win some, you lose some
21:54 πŸ”— bzc6p Also, one can't do anything to archive this site, as it's down since July. They say it's a database problem but stuff is not lost. And, most importantly,
21:54 πŸ”— bzc6p WE ARE CONTINUOUSLY WORKING ON IT
21:54 πŸ”— bzc6p Yeah, since July. Continuously.
21:55 πŸ”— bzc6p Sorry for flooding with this, but felt good to tell my first time experienceΕ±
21:56 πŸ”— SimpBrain either the company wants to do something helpful, or they just want to trash it
21:56 πŸ”— bzc6p especially when upset for 20 million photos probably being lost while they pretend "Oh, don't worry, we'll fix it."
22:02 πŸ”— bzc6p I hate such temporary states.
22:05 πŸ”— bzc6p Their conscience may not let them say "Well, we'll just delete all your stuff", but they don't want to tire with fixing the database. (Can a database go sooo wrong that they say every day "yuck, maybe tomorrow"?)
22:05 πŸ”— bzc6p Sometimes I say that too. The difference is that I'm not sitting on tens of terabytes of user data.
22:06 πŸ”— bzc6p Okay, I've done with my outrage.
22:08 πŸ”— slyphic is now known as slyphic|a
22:14 πŸ”— snape bzc6p, I once had to try to migrate a forum between an eight-year-old long-abandoned software and something newer. MySQL DB was a mere 600MB or so. Each attempt to convert it over into a newer more standard format took about seventy hours on a middle-of-the-road dedicated server. And it took about ten tries to get it right, not gonna lie. >.<
22:15 πŸ”— espes__ has quit IRC (Read error: Operation timed out)
22:17 πŸ”— godane SketchCow: starting to upload 2008-04 of kpfa
22:20 πŸ”— bzc6p Those who have run a site for ten years should have the knowledge and time to do that. Also, it seems they could just re-create a new database from the directory structure (maybe some info would be lost but at least images could be shown per user). Also, WHERE IS THEIR DATABASE BACKUP? Also, why haven't they done anything for months?
22:22 πŸ”— SN4T14 has joined #archiveteam-bs
22:22 πŸ”— SN4T14 has quit IRC (Connection closed)
22:22 πŸ”— bzc6p I can't see any viable excuse. I consider this a nasty way of shutting down.
22:23 πŸ”— * joepie91 agress
22:23 πŸ”— joepie91 agrees*
22:25 πŸ”— Stiletto has quit IRC (Read error: Connection reset by peer)
22:25 πŸ”— bzc6p I am agress.
22:26 πŸ”— Stiletto has joined #archiveteam-bs
22:29 πŸ”— snape Give 'em grudging respect for not selling the "assets" (i.e. user info) to some sleazy marketing corp, at least. Cough, Myspace, cough.
22:30 πŸ”— SimpBrain userdata is $$$
22:39 πŸ”— bzc6p has left
22:47 πŸ”— wyatt8740 has quit IRC (Read error: Operation timed out)
23:00 πŸ”— bzc6p has joined #archiveteam-bs
23:00 πŸ”— swebb sets mode: +o bzc6p
23:03 πŸ”— bzc6p I know I've spoke enough of that service, but there are a few more laughable details I just discovered and can't help telling it.
23:05 πŸ”— bzc6p 1. I wrote directly to the company's contact: the company name was on the main page (now contains just plain text about the error) as kind of a signature. Right after they received my mail, they removed their company name from there.
23:05 πŸ”— bzc6p 2. The only contact info left there now is just an email address, which, by the way, is not working.
23:06 πŸ”— * bzc6p doesn't know if to laugh or to cry
23:09 πŸ”— bzc6p Maybe it's time to tell I speak about fotoalbum.hu
23:10 πŸ”— Stiletto has quit IRC (Read error: Operation timed out)
23:11 πŸ”— bzc6p joepie91: I thought you were the most quick-tempered here in ArchiveTeam. Maybe not.
23:11 πŸ”— bzc6p Sorry everyone again for writing too much. I'll try not to appear in a while.
23:12 πŸ”— joepie91 bzc6p: don't worry about it, I think we all silently share the frustration :p
23:13 πŸ”— * xmc nods quietly
23:13 πŸ”— xmc i wrote to a url shortener years ago about sharing their db
23:13 πŸ”— xmc they said "fuck off i'm running this forever"
23:13 πŸ”— joepie91 they went bust months later?
23:14 πŸ”— xmc six months ago i registered the domain because they let it lapse and nobody cared enough to squat it
23:14 πŸ”— bzc6p It seems all of us have their own stories.
23:18 πŸ”— swebb url shorteners are a dime a dozen.
23:18 πŸ”— xmc lispurl.com was unique :(
23:20 πŸ”— swebb Sure it was. :)
23:21 πŸ”— xmc you'd get things like http://lispurl.com/caadadaaadddar
23:23 πŸ”— swebb Here's my list from about a year ago that I created from the twitter stream: https://gist.github.com/scumola/5216839
23:23 πŸ”— swebb That's the top-1000 url shorteners used on twitter in 2014.
23:23 πŸ”— swebb I'd bet money that 1/2 of them are not working anymore.
23:27 πŸ”— nickname_ has quit IRC (Ping timeout: 300 seconds)
23:29 πŸ”— swebb I like this one - really short: http://v.ht/
23:35 πŸ”— bzc6p has left
23:35 πŸ”— snape Not sure if anyone's interested, but there's a photo/file host in the gulf with no discernable download limits and sequential numeric IDs; mrkzgulf.com/do.php?img=NNNNNN, where NNNNNN is roughly 190000 or below. Most pics appear to be Daesh propaganda; most non-pic files appear to be split rars of older scene movie releases. O.o
23:39 πŸ”— nickname_ has joined #archiveteam-bs
23:44 πŸ”— xmc hm, that could be a good idea to snag
23:44 πŸ”— xmc a single-person wpull project, even :P
23:48 πŸ”— snape I did an exploratory scrape of IDs 187950-189950 just for the hell of it (2k items), and it would up being about 23GB, I think the total was. If my math is right that'd make the whole thing probably under 2TB altogether, perhaps a lot less depending on what the starting ID is, which I haven't explored.
23:50 πŸ”— SN4T14 has joined #archiveteam-bs
23:50 πŸ”— SN4T14 has quit IRC (Connection closed)

irclogger-viewer