#archiveteam-bs 2017-12-22,Fri

↑back Search

Time Nickname Message
00:00 🔗 Jon has quit IRC (Quit: ZNC - http://znc.in)
00:01 🔗 RichardG_ has joined #archiveteam-bs
00:01 🔗 RichardG has quit IRC (Read error: Connection reset by peer)
00:06 🔗 icedice2 has quit IRC (Read error: Operation timed out)
00:17 🔗 robink has quit IRC (Quit: No Ping reply in 210 seconds.)
00:19 🔗 tuluu_ has quit IRC (Quit: No Ping reply in 180 seconds.)
00:20 🔗 kristian_ has quit IRC (Quit: Leaving)
00:21 🔗 tuluu has joined #archiveteam-bs
00:22 🔗 robink has joined #archiveteam-bs
00:29 🔗 dd0a13f37 has quit IRC (Quit: Connection closed for inactivity)
00:33 🔗 robink has quit IRC (Read error: Connection reset by peer)
00:35 🔗 zgrant has quit IRC (Quit: Leaving.)
00:35 🔗 zgrant has joined #archiveteam-bs
00:37 🔗 robink has joined #archiveteam-bs
00:47 🔗 robink has quit IRC (Read error: Connection reset by peer)
00:52 🔗 robink has joined #archiveteam-bs
01:06 🔗 godane SketchCow: i'm starting to upload my random captures of qvc japan
01:21 🔗 kyounko has quit IRC (Read error: Operation timed out)
01:23 🔗 Ceryn has joined #archiveteam-bs
01:28 🔗 wp494 has quit IRC (Read error: Operation timed out)
01:34 🔗 zgrant has quit IRC (Quit: Leaving.)
01:34 🔗 zgrant has joined #archiveteam-bs
01:35 🔗 zgrant has quit IRC (Client Quit)
01:35 🔗 wp494 has joined #archiveteam-bs
01:35 🔗 zgrant has joined #archiveteam-bs
01:38 🔗 SketchCow hurrah
01:40 🔗 godane i'm going to capture a ton of qvc japan over the next few days
01:46 🔗 wp494_ has joined #archiveteam-bs
01:49 🔗 wp494 has quit IRC (Read error: Operation timed out)
02:14 🔗 pizzaiolo has quit IRC (Remote host closed the connection)
02:25 🔗 wp494_ is now known as wp494
03:15 🔗 Odd0002 has quit IRC (Ping timeout: 600 seconds)
03:24 🔗 Odd0002 has joined #archiveteam-bs
03:34 🔗 Odd0002 has quit IRC (Ping timeout: 506 seconds)
03:44 🔗 Odd0002 has joined #archiveteam-bs
04:24 🔗 wp494 has quit IRC (LOUD UNNECESSARY QUIT MESSAGES)
04:25 🔗 wp494 has joined #archiveteam-bs
04:39 🔗 Odd0002 has quit IRC (Ping timeout: 248 seconds)
04:55 🔗 qw3rty118 has joined #archiveteam-bs
04:56 🔗 BlueMaxim has joined #archiveteam-bs
05:01 🔗 qw3rty117 has quit IRC (Read error: Operation timed out)
05:17 🔗 zgrant has left
05:30 🔗 Odd0002 has joined #archiveteam-bs
06:17 🔗 Odd0002 has quit IRC (Quit: ZNC - http://znc.in)
06:19 🔗 Odd0002 has joined #archiveteam-bs
06:32 🔗 Valentin- has joined #archiveteam-bs
06:33 🔗 Valentine has quit IRC (Ping timeout: 506 seconds)
06:45 🔗 jschwart has joined #archiveteam-bs
06:50 🔗 jschwart has quit IRC (Client Quit)
07:18 🔗 Mateon1 has quit IRC (Ping timeout: 260 seconds)
07:18 🔗 Mateon1 has joined #archiveteam-bs
07:32 🔗 kimmer12 has joined #archiveteam-bs
07:38 🔗 kimmer1 has quit IRC (Ping timeout: 633 seconds)
07:42 🔗 kimmer1 has joined #archiveteam-bs
07:48 🔗 kimmer12 has quit IRC (Ping timeout: 633 seconds)
07:53 🔗 kimmer1 has quit IRC (Ping timeout: 633 seconds)
08:06 🔗 schbirid has joined #archiveteam-bs
10:15 🔗 odemg SketchCow, godane could either of you put 30 new 750GB 2.5" Momentus Hybrid SSHD Drives to use? (on at/ia related things)
10:28 🔗 jrwr odemg: yes, make a rsync box
10:36 🔗 Mateon1 has quit IRC (Remote host closed the connection)
10:36 🔗 Mateon1 has joined #archiveteam-bs
10:43 🔗 odemg jrwr, yeah I'll be getting these to whoever can do something like with them, document what they did and talk about it in a r/DataHoarder post
10:44 🔗 schbirid nice
11:03 🔗 pizzaiolo has joined #archiveteam-bs
11:23 🔗 Igloo odemg: Interesting. I'll happily upfront most of a server build for an rsync target in the US
11:23 🔗 Igloo Ideally in California
11:23 🔗 ranma fuck
11:23 🔗 ranma i hate it when i miss some upload to a youtube channel i've been following
11:24 🔗 ranma *when an upload gets deleted
11:24 🔗 odemg Igloo, exactly what we need
11:24 🔗 Igloo I'm not US based though, We'd need someone to help look at colo
11:25 🔗 Specular has joined #archiveteam-bs
11:27 🔗 Specular so I've read flash memory cards have a terrible shelf life for retaining data without corruption. Hope my two year old backup of something isn't fucked. Will be buying a HDD tomorrow and transfer the contents.
11:27 🔗 jrwr Igloo: there are a few out there, the best is if we can get hands on it there at the colo
11:36 🔗 Specular btw, for hdd brands it seems HGST is more reliable in those famouse BackBlaze results, but it's harder to compare to WD drives since they lack the same volume. I've always used WD without problems, but is HGST more reliable for long-term storage (even their 2.5" drives for ex)?
11:50 🔗 jrwr Omfg this is a thing now https://www.reddit.com/r/spacex/comments/7lez5n/elon_musks_midnight_cherry_tesla_roadster/
11:51 🔗 jrwr That's the cat ON the payload mount
11:51 🔗 jrwr Car
12:11 🔗 Igloo jrwr: Yep. We'd need someone who either lives local to be able to do it or some sort of IPKVM / power bar solution
12:11 🔗 HCross2 hmm
12:11 🔗 Igloo jrwr sending the car to mars huh, interesting
12:12 🔗 HCross2 I know where.. but they arent cheap at all
12:12 🔗 Igloo Define not cheap
12:14 🔗 HCross2 not sure of an exact figure. I was thinking Psychz but they arent cheap
12:16 🔗 Igloo General Button Pressing $0
12:16 🔗 Igloo How much is specific button pressing in order?
12:21 🔗 HCross2 a million quid
12:21 🔗 HCross2 per button
12:21 🔗 HCross2 extra if the button is very deep
12:21 🔗 HCross2 £5 million if it needs a paperclip
12:34 🔗 icedice has joined #archiveteam-bs
12:36 🔗 Igloo I've asked for pricing for a total of 4u HCross2. 2 x 2u SuperMicro chassis, one 24 bay for storage / s3 and one 16 bay with disks and a load of NVME drives for the megawarc factory
12:36 🔗 Igloo If you then back to back the servers over 10G it'll be stupid fast.
12:36 🔗 HCross2 ah nice
12:37 🔗 Igloo Lets see. Be a good case and maybe a chance to use some of the crowd sourced funds that asparagirl was looking at
12:49 🔗 icedice has quit IRC (Read error: Connection reset by peer)
12:55 🔗 icedice has joined #archiveteam-bs
13:06 🔗 godane SketchCow: so when is your new box of tapes getting shipped?
13:07 🔗 godane to me
13:10 🔗 jrwr Oh man a new FOS
13:22 🔗 JAA Specular: Backblaze's stats are interesting but essentially irrelevant for almost every use case. The drives are way outside the specs in those storage pods...
13:23 🔗 JAA jrwr: Yeah, Elon's sending the car because nobody wanted to send a real payload due to the risk involved. At least that's what I read.
13:23 🔗 jrwr that is correct
13:24 🔗 jrwr im 100% happy that he IS sending something cool
13:24 🔗 Specular JAA, by that do you mean that they're being used far more than intended?
13:25 🔗 JAA Specular: They're exposed to way more vibration, in particular.
13:25 🔗 JAA Because there are so many other drives nearby.
13:26 🔗 JAA Most of the drives they used aren't even rated for NAS usage. And if they are, it's usually only for up to 6 drives or something like that.
13:26 🔗 JAA they use*
13:28 🔗 Specular it's interesting that I haven't seen that brought up in discussion of BB's results, but vibration would be a considerable factor for sure
13:29 🔗 JAA It's mentioned every time someone posts the stats on /r/DataHoarder, at least.
13:47 🔗 BlueMaxim has quit IRC (Quit: Leaving)
14:35 🔗 Stilett0 has quit IRC (Read error: Operation timed out)
14:48 🔗 Stilett0 has joined #archiveteam-bs
15:50 🔗 ola_norsk has joined #archiveteam-bs
15:52 🔗 MrDignity has quit IRC (Remote host closed the connection)
15:58 🔗 Odd0002 has quit IRC (Read error: Operation timed out)
16:18 🔗 Odd0002 has joined #archiveteam-bs
16:56 🔗 zgrant has joined #archiveteam-bs
17:00 🔗 Specular has quit IRC (Quit: Leaving)
17:36 🔗 icedice has quit IRC (Read error: Connection reset by peer)
17:41 🔗 beardicus has quit IRC (bye)
17:45 🔗 beardicus has joined #archiveteam-bs
18:01 🔗 zgrant has quit IRC (Quit: Leaving.)
18:04 🔗 ez JAA: its partially datahoarder meme (a bit like ECC on ZFS). what BB actually said is: we dont know if vibration of a lot of nearby drives matters (it probably does), what we DO know that consumer vs "nas rated" enterprise BOTH fail at the same rate in "hostile pod environment" - https://www.backblaze.com/blog/enterprise-drive-reliability/
18:05 🔗 ola_norsk Odd0002: when in sqlite db, would perhaps having a seperate 'text' content table, to prevent storing identical texts in another table be benifical; Or could the operating of keeping an index of it all make it slower?
18:05 🔗 ola_norsk Odd0002: i'm guessing quite a huge percentage of tweets contain the exact same text content
18:06 🔗 JAA ez: Right. They don't have many enterprise drives though. I'd like to see a proper statistical analysis of the data.
18:06 🔗 JAA The uncertainty range on the enterprise drives would be huge.
18:06 🔗 JAA (Or confidence interval, if we want to go by statistical lingo.)
18:07 🔗 ez JAA: its also quite possible that the ent rated drives ARE better, when subjected to milder conditions
18:07 🔗 ez and theres no difference only when subjected to extremes
18:07 🔗 JAA True
18:09 🔗 JAA I just remembered this analysis, by the way: https://hackernoon.com/applying-medical-statistics-to-the-backblaze-hard-drive-stats-36227cfd5372
18:09 🔗 ez anyhow, given that price per gb drops by what, 15%-20% a year? i guess the higher MTBF could worth the additional density
18:09 🔗 ez s/higher/worse/
18:09 🔗 JAA Not sure if that includes any enterprise drives, too lazy to check all the model numbers right now.
18:10 🔗 JAA Haha, price drops per GB, I wish...
18:10 🔗 ez it does, its just flatting out sigmoid
18:10 🔗 JAA Prices here haven't changed much in years. It only started again in the recent months.
18:10 🔗 ez it is approaching limit at ever decreasing rate, but it does drop
18:10 🔗 JAA Here != US, just in case.
18:11 🔗 JAA I'm curious to see what the next few years will bring though with MAMR and HAMR.
18:11 🔗 ez yea, theres a lot of weird markup and outright exotic market events
18:11 🔗 ez like those 4-6TB drives in USB frames
18:11 🔗 ez being way cheaper than the same thing, standalone
18:11 🔗 JAA Yeah
18:13 🔗 JAA One random article from Germany mentions that hard drive prices dropped from 0.09€ to 0.06€ per GB between 2012 and 2017. So apparently it did drop a bit (I didn't really notice that, but I also haven't bought many HDDs in the last few years), but definitely not 15-20% per year.
18:14 🔗 JAA SSDs dropped from 0.99€ to 0.17€ over the same timeframe, by the way.
18:16 🔗 ez JAA: ssd and hdd are on different part of the sigmoid
18:16 🔗 JAA Yeah yeah, I know.
18:17 🔗 kimmer1 has joined #archiveteam-bs
18:19 🔗 ez JAA: as for MAMR, that alone is supposed to flatten the sigmoid too, to the point it will be comparable to ssd perhaps
18:20 🔗 ez trouble being of course nobody does mamr yet, and second, mamr writes are slow, like really slow
18:21 🔗 ez https://regmedia.co.uk/2017/10/12/wdc_mamr_hdd_vs_ssds.jpg
18:21 🔗 ez its WD marketing department, so i'd take it with grain of salt, but doesnt smell of complete bullshit either
18:24 🔗 ola_norsk Odd0002: here's the database 'schema' i have now https://imgur.com/a/gNxNh , could it be done even better perhaps?
18:25 🔗 jschwart has joined #archiveteam-bs
18:26 🔗 ez ola_norsk: why splitting the tweet text?
18:26 🔗 ez also, you need a table of who-follows-who to make that data useful
18:27 🔗 JAA ez: I'm just hoping that the prices actually decrease in Europe again as well.
18:27 🔗 JAA prices per GB, that is.
18:27 🔗 ez EU retail prices are kinda insane, yea
18:27 🔗 ez of everything computer basically
18:28 🔗 JAA Here's a compilation of prices per GB over the last ~20 years: https://blog.tralios.de/wp-content/uploads/2016/03/Festplattenpreise2016.png
18:28 🔗 JAA In Germany
18:28 🔗 ola_norsk ez: the separate text/content table i think could prevent storing identical tweet content
18:28 🔗 ez i call it "vat abuse"
18:28 🔗 ez sure, we DO have vat
18:28 🔗 JAA Prices now are essentially the same as ca. 2010.
18:28 🔗 ez but that does not explain the 30% markup on top of that
18:28 🔗 ez most of places have
18:28 🔗 JAA What does it have to do with VAT?
18:28 🔗 ola_norsk ez: i'm not sure though, but e.g tweets containing just 'LOL!!' etc
18:29 🔗 ola_norsk ez: instead of storing a multiple of 'LOL!!" tweets, i mean
18:29 🔗 ez JAA: consumers who compare prices with US think "oh, thats just VAT, thats why its more expensive in EU"
18:30 🔗 ez but the reality is that we have simply much higher markups, too
18:31 🔗 JAA Ah, yeah.
18:32 🔗 ez ola_norsk: the bloated index wont make up for such a "compression". you do, however, want to store retweets as reference in some way
18:33 🔗 ez retweets are usually stored as separate table of 'tweetid, whoretweetedit'. its somewhat awkward as you dont get a "timeline" of events in one table, but it is compact and query that way
18:34 🔗 ez s/query/fast to query/
18:35 🔗 ola_norsk ez: hmmm..might be able to pick out '@username' from the tweet texts. I think tweep does pick the first @' as sender, then include the others in content text
18:36 🔗 ez oh yea, indexing threads is nice to have.
18:36 🔗 ez basically for each tweet you need to get 1) whom it refers to (multiple) 2) who retweeted it (multiple)
18:37 🔗 ez but things like the actual author of a tweet can be trivially part of the tweet itself, as well the other data you have now separate
18:38 🔗 ola_norsk i have to go by the output of 'tweep' for now i think https://ia801505.us.archive.org/24/items/tweeptestcrash/tweets.txt
18:42 🔗 ez ola_norsk: yea, with that its easier to just store one line = one row
18:42 🔗 ez especially if you use sqlite and usernames can be indexed simply be the text
18:43 🔗 ola_norsk ok
18:44 🔗 ez the retweet stuff is important only when doing full scrape. hashtag search will shows only original tweets
18:44 🔗 ez so no way the entries could be duplicate
18:46 🔗 ola_norsk ez: what i mean if theres found multiple tweets containing the exact same text content (example "43942641632403456 2017-12-21 20:33:58 CET <OfficialFayBla> Ur welcome "
18:46 🔗 ez ola_norsk: also try NetNeutralty and NetNeutralty
18:46 🔗 ola_norsk ez if some other tweet also is just "Ur welcome "
18:46 🔗 ez the typos virtually always correspond to legibility of the users, ie you get cher-like tweets
18:47 🔗 ez NetNetruality and NetNeutralty i mean
18:47 🔗 ez maybe there are other common typos
18:47 🔗 ez ola_norsk: again, its "deduping" a problem which isnt
18:47 🔗 ola_norsk ez: but could storing uique texts prevent a lot of duplicates?
18:47 🔗 ola_norsk ok
18:55 🔗 kimmer12 has joined #archiveteam-bs
18:58 🔗 odemg has quit IRC (Remote host closed the connection)
19:01 🔗 kimmer1 has quit IRC (Ping timeout: 633 seconds)
19:02 🔗 kimmer1 has joined #archiveteam-bs
19:03 🔗 icedice has joined #archiveteam-bs
19:05 🔗 ola_norsk after looking trough a couple of tweeted links, it seems by sending them all to waybackmachchine might cause some 'dubious' sites and pictues to be stored; Is that a problem? :]
19:06 🔗 ola_norsk dubious, as in adult content of various sorts :D
19:06 🔗 ola_norsk e.g what's the policy of IA of storing nudes?
19:08 🔗 * ola_norsk don't exactly want to be banned for piping pr0n :D
19:09 🔗 kimmer12 has quit IRC (Read error: Operation timed out)
19:12 🔗 kristian_ has joined #archiveteam-bs
19:13 🔗 ola_norsk if _accidentally_ waybacking' adult material, does IA blame the submitter or the site that got waybacked?
19:14 🔗 ola_norsk e.g if a dickpick from a twitter feed got in there..
19:22 🔗 icedice What bit rate should I convert a 256 kb/s MP3 to AAC at for it to be somewhat lossless (yeah, I know it's lossy, I just want the best quality I can get out of that export)
19:22 🔗 JAA For best quality, highest bitrate available in AAC.
19:22 🔗 JAA But really, why do you want to do that?
19:22 🔗 ola_norsk 1kb lower with constant bitrate?
19:23 🔗 ola_norsk what JAA said, why re-encode
19:23 🔗 icedice 320 kb/s is max in Adobe Premiere Pro
19:23 🔗 icedice Because I want to export the video
19:23 🔗 icedice I can't export it as an MP3 track
19:24 🔗 ola_norsk increasing bitrate on already compressed audio is futile
19:24 🔗 icedice Yeah, I figured
19:24 🔗 ola_norsk just export them seperate, and use e.g ffmpeg to merge the two
19:24 🔗 JAA Yep, even with 320 kb/s AAC, you'll lose quality, but you get a larger file.
19:24 🔗 icedice I'll just go with 256 kb/s
19:25 🔗 icedice How would it be compability-wise? It's going into an MP4 container.
19:25 🔗 ola_norsk icedice: if you export the video, and have the original mp3, ffmpeg can merge them
19:25 🔗 JAA Where do you want to play it?
19:25 🔗 icedice idk
19:26 🔗 JAA If it's a computer, you can find a software that can handle it for sure.
19:26 🔗 JAA Embedded systems, *shrug*
19:26 🔗 icedice It wouldn't surprise me if the person who is going to get the video uploads it to YouTube and/or Facebook
19:26 🔗 ola_norsk they recode anyway i think
19:26 🔗 icedice I doubt they're into Vimeo or Dailymotion at least
19:26 🔗 icedice Yeah
19:26 🔗 icedice YouTube is AAC
19:27 🔗 icedice And I figured that they do a crappier job than Adobe Premiere Pro
19:27 🔗 schbirid has quit IRC (Quit: Leaving)
19:27 🔗 JAA They probably spent a ton of time optimising their transcoding.
19:28 🔗 JAA Whether it's "maximum quality" is a different question though.
19:28 🔗 icedice YouTube has shit quality though
19:28 🔗 JAA Yep
19:28 🔗 JAA I tell that to my girlfriend all the time, but she insists on using it anyway.
19:28 🔗 JAA (For listening to music)
19:28 🔗 ola_norsk anyway, i'm no expert by far, but i'd say #1 export the video (without audio) then use ffmpeg to merge the untouched audio with the video stream as e.g .MKV, and upload that
19:28 🔗 ola_norsk or mvk
19:28 🔗 icedice I've cut out parts of the audio track, so I'd need to do that again in like Audacity and then resave it
19:29 🔗 ola_norsk i think working on it as wav is then the best
19:29 🔗 icedice Which I don't feel like redoing (and would have trouble getting exactly right compared to the video)
19:29 🔗 icedice Yeah
19:29 🔗 icedice I think I'll just go with AAC
19:30 🔗 icedice The assignment is late enough as it is and my teacher seemed pretty pissed at me for uploading the project file instead of the video file as filler to stall the whole thing
19:31 🔗 ola_norsk i doubt a teacher would piss on audioquality if it's less than 100% shitty :D
19:31 🔗 ola_norsk not*
19:31 🔗 icedice The video is fucked enough anyway
19:33 🔗 icedice My phone stopped filming the 1 hour when it ran out of space
19:33 🔗 icedice My classmate then started filming it with his phone
19:34 🔗 icedice But the selfiestick/tripod hybrid pressed the power button and shut down his phone
19:34 🔗 icedice So another classmate continued filming
19:35 🔗 icedice TL;DR: There are two spaces in the video track that I filled with two video screenshots each since there's no video of it
19:35 🔗 icedice Luckily the external mic worked well and got all of the audio
19:35 🔗 ola_norsk if your teacher can hear the difference between 256kb/s audio and 320kb/s.. :D
19:36 🔗 icedice There's no 320 kb/s though
19:36 🔗 icedice just 256 kb/s and bloated 256 kb/s
19:36 🔗 ola_norsk i mean if you're planning to reencode the audio
19:36 🔗 icedice Yeah
19:36 🔗 icedice I'm going with 256 kb/s though
19:37 🔗 ola_norsk 128kb is CD quality..
19:37 🔗 ola_norsk aye
19:37 🔗 icedice I though that was 192 kb/s
19:37 🔗 icedice I was just wondering if it would have been possible to go with less kb/s than 256 and still achieve the same audio quality since AAC is better quality-wise than MP3
19:38 🔗 ola_norsk there's not point in going up in kbs on an already compressed audio file though. It would just make a bigger file, with same (or even a bit shittier) quality
19:39 🔗 ola_norsk but, there's no need to even touch the audio if you have the best possible copy of it
19:40 🔗 icedice JAA: Well, YouTube is convienient. I use it for music listening as well. Even if I had a headset that was high-end enough for me to hear the difference between YouTube audio and uncompressed audio I probably wouldn't notice anyway.
19:40 🔗 icedice Yeah, well I was wondering about going down in kb/s
19:40 🔗 icedice Like 256 kb/s MP3 = how much kb/s in AAC
19:40 🔗 ola_norsk https://superuser.com/questions/277642/how-to-merge-audio-and-video-file-in-ffmpeg
19:40 🔗 icedice Quality-wise
19:41 🔗 icedice I'm pretty sure I have that in a .txt document from before
19:41 🔗 icedice It's just having to cut it again that is a pain in the ass
19:42 🔗 icedice I guess it could be possible in Avidemux if I had the time
19:42 🔗 ola_norsk '-c:a copy' would keep the audio from being recoded
19:42 🔗 icedice I still need to cut out extra shit from the audio track
19:42 🔗 icedice And it has to match the video track exactly, otherwise the lecturer will look like she's lip syncing
19:43 🔗 JAA icedice: Well yeah, it's not just that, also that you have to rely on internet connectivity and the music you want to listen to being available on YouTube.
19:43 🔗 JAA The main reason for me personally is quality though.
19:44 🔗 ola_norsk icedice: if you're plannint to use Youtube to show it anyway, it's going to be recoded no matter what at playback
19:44 🔗 ola_norsk planning
19:44 🔗 JAA transcoded*, you mean, right?
19:44 🔗 ola_norsk that
19:47 🔗 ola_norsk and i'd be frightened of a lecturer who would exclaim "Whait just a damn minute! ..This audio is 244kb/s, not 320!!"
19:47 🔗 ola_norsk that'd be golden-ears deluxe
19:55 🔗 icedice JAA: You can download it and demux it to AAC/M4A using JDownloader 2 or Youtube-DLG though. I do that sometimes.
19:56 🔗 icedice And yeah, I should search for the FLAC files, I'm just a bit lazy with that atm
19:56 🔗 icedice I'll have some FLAC torrenting marathon some day when I have time though
19:56 🔗 ola_norsk with 'youtube-dl -k' it keeps the video and audio
19:57 🔗 icedice https://github.com/MrS0m30n3/youtube-dl-gui
19:57 🔗 icedice ^ I was talking about that youtube-dl GUI
20:00 🔗 ez youtube-dl -f bestaudio -o out.m4a
20:00 🔗 ez its horrible tho, 128k aac iirc
20:00 🔗 ola_norsk i just use -k / --keep-fragments
20:00 🔗 ez why?
20:01 🔗 ola_norsk to keep best audio present since youtube-dl often merges and deletes
20:01 🔗 ez 251 webm audio only DASH audio 142k , opus @160k, 3.59MiB
20:01 🔗 ez oh neat
20:02 🔗 ez youtube-dl -f bestaudio -o out.opus then
20:02 🔗 ez ola_norsk: huh?
20:02 🔗 ez no
20:03 🔗 ez ytdl preserves the bitstream unless you tell it to do something stupid, like output mp3
20:04 🔗 ola_norsk e.g 'youtube-dk -k <link>' will keep the audio and video seperate, whitout deleting them when merging into container
20:04 🔗 ola_norsk 'youtube-dl -k'*
20:05 🔗 ez yea
20:05 🔗 ez and if you let it do its thing, that is just set output container compatible with the dash track, it will mux it from fragments into a single file
20:05 🔗 ez this is *not* transcoding
20:05 🔗 ola_norsk i find it useful when archiving a youtube video that's pretty must intersting speach/talk
20:06 🔗 ez oh
20:06 🔗 ola_norsk e.g this item https://archive.org/details/Tay_Zonday_Net_Neutrality_talk
20:06 🔗 ez your intent actually is to keep a/v separate
20:06 🔗 ez yea, that probably makes sense for talk show
20:06 🔗 ola_norsk aye
20:07 🔗 ola_norsk that way it's possible just to listen to it, since it's just talk anyway with no important visual stuff
20:21 🔗 ola_norsk speaching if which, it would be cool if IA made secondary 'mediatype' possible for items, like with 'test item' that is also listed as both main 'community <type>' and 'Test Collection' :D https://archive.org/details/superfunky59_Series_of_Tubes_Music_Video
20:22 🔗 ola_norsk this item is audio, but _does_ contain video
20:23 🔗 icedice Btw, has Internet Archive gone to Canada yet?
20:23 🔗 icedice They'd set up a backup facility there, right?
20:23 🔗 ola_norsk i think so
20:24 🔗 ola_norsk someone posted a link here once whre the infrastructure could be viewed
20:25 🔗 JAA ola_norsk: That's not what the mediatype is about though. What you mean is that an item can be in multiple collections, and that's already the case I think.
20:25 🔗 ola_norsk ah ok
20:26 🔗 ola_norsk it's not in collection other than 'Community Audio', but i'm guessing IA also goes by filetypes
20:26 🔗 icedice I don't get it why'd they'd set up shop in Canada
20:27 🔗 icedice Not counting that it's a Five Eyes country it's geographically the US' neighbour and it piggybacks on the US' soon to be much worse Internet
20:27 🔗 ola_norsk if it was made available, they should though
20:28 🔗 icedice Switzerland or something would have been better imo. Very good privacy laws and geographically distant and safe.
20:28 🔗 ola_norsk i've been sending some emails around here in norway
20:29 🔗 icedice But I'm just a random guy on the Internet with >opinions, so what do I know
20:29 🔗 ola_norsk problem is, i kind of need some sort of 'presentation'
20:30 🔗 ola_norsk i'm just a drunk fuck on an island on the westcoast of norway; So, it would need someone with a bit more 'umph' behind to be anything more
20:32 🔗 ola_norsk when i got the question (translated); 'have you talked to Brewster Kahle?' , i was damn close to writing back 'who the fuck do you think i am? All i asked you was your stance on a question!'
20:33 🔗 ola_norsk also, it seems to be a common belief that IA is merely 'waybackmachine'
20:34 🔗 ola_norsk so, a kind of offical 'pitch deck' would be very nice to present
20:35 🔗 ola_norsk but, according to 'Norsk Dataforening' (which is not small)..a Norwegian mirror is 'i like the idea'
20:36 🔗 ola_norsk followed up with 'what more can you tell me about it'..and there's my problem
20:37 🔗 icedice Are you pitching a Norwegian Internet Archive mirror or am I misunderstanding something?
20:37 🔗 ola_norsk so if someone with a bit more 'panash' .. http://www.dataforeningen.no/in-english.128921.no.html
20:37 🔗 icedice Because that would be pretty sweet
20:37 🔗 ola_norsk icedice: aye
20:38 🔗 icedice I remember seeing a video on YouTube of some old mine in Norway that they had made into a data center
20:38 🔗 icedice Looked pretty sweet
20:38 🔗 ola_norsk according to their representative it's a 'good idea'..but i can not be the one carrying it further :D
20:41 🔗 ez EU IA is interesting dillema
20:41 🔗 ez on one hand, storage hardware almost twice as expensive
20:41 🔗 ez on another hand, bandwidth is about 4x cheaper
20:41 🔗 ola_norsk Norway is member of EU, we're kind of strange like that :D
20:41 🔗 ez i guess ia is more constrained by storage than bw tho
20:41 🔗 ola_norsk not*
20:42 🔗 ola_norsk aye, but e.g here in norway there's not only focus on preservation of old shit, but also 'green power'
20:43 🔗 ez thats probably fine, green power often means cheaper power these days
20:43 🔗 ez its funny for norway to be green obsessed, when they're arguably the biggest source of CO2 of all eu countries
20:43 🔗 ez not directly, but they originate that much oil none the less
20:44 🔗 ola_norsk hehe, that might be true, but the gasoline is still more expensive here than the same gasoline when it's exported :/
20:44 🔗 ola_norsk same source though..wierd how that works :D
20:44 🔗 ez youre a nordic country
20:44 🔗 ez nordic means insane taxes
20:44 🔗 icedice The Netherlands, Germany, and France is pretty cheap for hosting though
20:44 🔗 ez except iceland (?) for some reason
20:45 🔗 ez they were never true vikings to begin with
20:45 🔗 ola_norsk dei e bedre enn svenska ;)
20:45 🔗 ola_norsk hehe, but damn, this is getting way off topic
20:46 🔗 ola_norsk back to topic: I've envisioned e.g https://greenmountain.no/
20:47 🔗 icedice Yeah, that's one of the data centers I was just looking at
20:47 🔗 ola_norsk that's where i started to nag first, on twitter.. ~7 months ago, never got a response..so, it needs someone bigger..And Dataforeningen is midly put quite big
20:48 🔗 icedice https://www.youtube.com/watch?v=gYrvRMWiZCA
20:48 🔗 icedice https://www.youtube.com/watch?v=aTjF2hJiack
20:48 🔗 icedice https://www.youtube.com/watch?v=oN9on73BmSs
20:50 🔗 ola_norsk all i know is that when an official representative of NCS writes back 'I like the idea, what more can you tell me about it?'..that's no small thing
20:51 🔗 ola_norsk ".The Norwegian Computer Society turned 50 years in 2003"
20:52 🔗 ola_norsk it's needs a response with equeal punch though..sadly, i can not provide that :/
20:52 🔗 ola_norsk basiclly, every IT company of Norway is member of NCS
20:53 🔗 ola_norsk probably Green Mountain AS as well
20:54 🔗 ola_norsk at the very least, i need some sort of presentation endorsed by IA, or someone in IA, to send
20:55 🔗 Somebody2 ola_norsk: As I think I mentioned the last time you brough this up -- you *do NOT need any permission* to mirror a whole bunch of IA.
20:55 🔗 ola_norsk can't just say 'i like backups!' :/
20:55 🔗 ola_norsk Somebody2: i need a 'pitch' though
20:55 🔗 Somebody2 Hm, not sure what you mean.
20:55 🔗 ola_norsk Somebody2: a kind of 'this is Internet Archive, and this is why our work is important'
20:56 🔗 kristian_ has quit IRC (Ping timeout: 360 seconds)
20:56 🔗 ola_norsk whether it be video, article or powerpoint slides
20:56 🔗 Somebody2 Ah. Does the existing pitch currently be displayed in a large banner on every IA page not suffice?
20:56 🔗 Somebody2 What about textfiles's 30-days-of-neat-stuff-on-IA tweets?
20:57 🔗 Somebody2 Also, you could identify particular collections on IA that you want to suggest NCS mirror, and make a pitch based on those.
20:58 🔗 Somebody2 I think it would be FABULOUS if NCS dedicated a few dozen petabytes to mirroring some of the publically downloadable parts of IA.
20:58 🔗 Somebody2 And they could do that without ANY coordination or permission from IA. Just do it, then send an email afterward going, ...
20:59 🔗 Somebody2 "Hi, thought you should know we've made a mirror of all this, if you want to direct people to it.
20:59 🔗 ola_norsk in my thought NCS would be the ones that swayed Norwegian Government to make sure a complete mirror exist
20:59 🔗 Somebody2 "And we'd be glad to mirror some of your restricted stuff, too, now that we've shown we can do the job."
21:00 🔗 Somebody2 Having NCS lobby the Norwegian government does make sense, yes.
21:00 🔗 ola_norsk aye
21:00 🔗 Somebody2 My point is just that doing that does NOT require any coordination or involvement by IA.
21:00 🔗 Somebody2 At least for the initial dozen petabytes of mirrored data.
21:01 🔗 ola_norsk wether it be educational department or culture/historical department, both of which have say in the matter
21:02 🔗 Somebody2 I'd reply back to the person who said it was a good idea, informing them that it doesn't require any coordination with IA, and ...
21:02 🔗 ez Somebody2: in case of iabak got some serious traction, is it possible to count ia's *support* of that endeavor?
21:02 🔗 Somebody2 ez: Yes, IA is supportive of mirrors, as I understand (I don't have any formal connection to them, though).
21:03 🔗 ez namely, better access to the current snapshot of ias data. the current query api works nice for individual items, but it gets awkward quick when things are done to be on scale this massive
21:03 🔗 Somebody2 ez: Well, the IA census seems to work well enough.
21:03 🔗 Somebody2 You are familiar with that, right?
21:03 🔗 ola_norsk Somebody2: do you have an email where i might forward the emails to?
21:04 🔗 Somebody2 ola_norsk: what, the ones from NCS? Why do you want to forward them?
21:04 🔗 ola_norsk Somebody2: of the emails/replies i had with the person in NCS
21:04 🔗 Somebody2 Again, why forward them? You DO NOT NEED ANY HELP FROM archive.org FOR THIS.
21:05 🔗 ez Somebody2: yes, im familiar with iamine from todd's effort to timestamp it
21:05 🔗 Somebody2 ez: good
21:06 🔗 Somebody2 ola_norsk: the next step is for you (and/or the person at NCS) to write up a proposal to the Norwegian government to fund storage.
21:07 🔗 Somebody2 ola_norsk: then just use the existing torrents provided by IA to mirror a bunch of stuff.
21:07 🔗 Somebody2 (storing it on the storage paid for by the Norwegian government)
21:08 🔗 ola_norsk Somebody2: i have no connection to IA, no real say in the matter. Basically, when it comes to being a 'middle man' of getting established a complete active mirror of archive.org..I might not that middle-man that's needed. :D
21:08 🔗 Somebody2 There IS NO MIDDLE-MAN needed, as I keep telling you.
21:09 🔗 ez Somebody2: also, i didnt see this explicitly stated anywhere, but is this data for archive.org/web/* as well
21:09 🔗 Somebody2 You don't need the permission, knowledge, or connection to ANYONE at IA to do this.
21:10 🔗 Somebody2 ez: The IA census does include hashes for the Wayback Machine data, in the "private" section (since the files aren't directly downloadable).
21:10 🔗 ez neat
21:10 🔗 ola_norsk Somebody2: how can a full copy of IA be made, functioning as a 'node' then ?
21:11 🔗 Somebody2 ola_norsk: Once a mirror of the publically downloadable data has been made (and paid for), *THEN* reach out to IA about mirroring the rest.
21:11 🔗 ez so basically "serious iabak" would amount to 1) better access to WBM data 2) a bit saner query api to query diffs from the last time. iamine is kinda slow, and i dont see any reason for it to be
21:11 🔗 ez its just a fairly straighforward database dump
21:12 🔗 Somebody2 ez: Eh, data on IA really shouldn't be changing regularly, so no, I don't think better access to diffs is much of a problem.
21:13 🔗 ez Somebody2: i mean delta since the last time
21:13 🔗 ez ideally there would be some IA's official append log structure, not for people to awkwardly reconstruct it every time
21:14 🔗 Somebody2 As I see it, the main next step for IA.BAK is clients for more platforms, that are easier to install, and a bunch of promotion to get lots more people to sign up.
21:14 🔗 ola_norsk Somebody2: that's kind of the problem as i see it.. ME, alone, reaching out is useless. I can barely reach the toilet in time when i have to take a piss. IA, like someone here said, is not a small thing.
21:14 🔗 sep332 has quit IRC (Read error: Operation timed out)
21:14 🔗 Somebody2 ez: A better log structure would certainly be nice, but I don't think it's a blocker for IA.BAK.
21:15 🔗 ez its a blocker to do this in serverless fashion
21:15 🔗 ola_norsk Somebody2: i think the maximum of my effort and use would be to get someone in NCS and IA to contact eachother and talk further
21:15 🔗 ez iabak doesnt need to be centrally coordinated, it works perfectly fine as a stochastic endeavor
21:16 🔗 ez provided the input to the backed up space is uniform
21:16 🔗 ez which it isnt atm
21:16 🔗 odemg has joined #archiveteam-bs
21:16 🔗 Somebody2 ola_norsk: Why, given that NCS *does not need any help from IA* in order to mirror a bunch of the content?
21:16 🔗 Somebody2 ola_norsk: I think the good use for your time and effort is to *inform* the person at NCS who you spoke to that they don't need IA's permission to mirror.
21:17 🔗 Somebody2 And encourage them to write up a grant proposal for server space.
21:18 🔗 Somebody2 ez: Let's not let the possiblity of a serverless architecture block progress on an existing backup.
21:18 🔗 ez Somebody2: given that current iabak stands at 0.5% progress to backup IA, its a bit premature optimization
21:18 🔗 ez im stipulating that looser coordination would yield better number than that
21:18 🔗 ez im not interested in "better" platform support, im interested in a client which doesnt need to coordinate at all
21:19 🔗 Somebody2 ez: OK, but can you write such a client without any change to IA's existing infrastructure? If no, it's not as good as one that CAN be written that way.
21:20 🔗 ez i can provided ia provides authoritative snapshot over the domain so the randomly picked items are uniformly random
21:20 🔗 ez basically current server architecture is p much result of IA not doing that
21:20 🔗 Somebody2 Yes, but they don't (yet).
21:21 🔗 Somebody2 I also don't understand what you mean by "randomly picked items are uniformly random"
21:21 🔗 ez they must be
21:21 🔗 Somebody2 ?
21:21 🔗 Somebody2 We should also take this to the #iabak channel
21:21 🔗 ez ah
21:21 🔗 Somebody2 er, #internetarchive.bak
21:34 🔗 ola_norsk Somebody2: here, the email exchange, https://archive.org/details/InboxStordabuenprotonmail-temp-item ..I'm neither an orginazer, spokes person or lobbyer of any imaginable sort. So, if someone are able to bring it further, that would be cool.
21:35 🔗 Somebody2 Ha. OK, well thanks for opening up the dialog in any case.
21:36 🔗 ola_norsk hopefully there's someone more eloquent than me to keep it going though
21:36 🔗 BlueMaxim has joined #archiveteam-bs
21:41 🔗 Somebody2 ola_norsk: Could you at least reply once more to let them know they can move forward WITHOUT any coordination with IA?
21:41 🔗 Somebody2 I think that's not expected (most organizations keep much tigher hold of their materials than IA does)
21:42 🔗 Somebody2 so letting the person you spoke to at NCS know that would be good.
21:42 🔗 ola_norsk Somebody2: like i mentioned before, that person, like many other seems to think IA is just waybackmachine.."I'm quite aware of Waybackmachine"
21:43 🔗 Somebody2 ola_norsk: Sure; so informing them that there are petabytes of material on IA that they could arrange to mirror without ...
21:43 🔗 Somebody2 ... any coordination with IA is really good to inform them of!
21:44 🔗 ola_norsk see, that's the way beyond the level of complexity of projects, where i 'peace out!' :D
21:47 🔗 Somebody2 ola_norsk: Wait, just writing a single email saying "You can download a lot of IA without permission" is a high level of complexity?
21:47 🔗 Somebody2 How is that any more complex than the emails you already wrote?
21:49 🔗 ola_norsk because after the reply to 'could i get some more concrete information about the idea?'..i did not get response back...
21:49 🔗 Somebody2 ola_norsk: I see.
21:50 🔗 ola_norsk so sending a 2 email without having gotten response..that's no good in my book :D
21:50 🔗 ola_norsk 2nd*
21:51 🔗 Somebody2 Ha. Well, I don't feel like I should write to them, because I don't speak Norwegian and I don't live in Norway. :-(
21:52 🔗 ola_norsk pretty sure they know english ;)
21:54 🔗 ola_norsk heck, even i know proper english when i put my beer soaked mind to it, watching out for typod and coloquial terms
21:54 🔗 Somebody2 ola_norsk: Yeah, but I feel like they'd respect me less.
21:54 🔗 ola_norsk then we have a problem..
21:55 🔗 ola_norsk so, change.org then?
21:55 🔗 Somebody2 Yep, if you feel you've worn out your welcome, and we don't have any other Norwegians interested in stepping up...
21:55 🔗 ola_norsk i've heard rumours there was another one..but alas, no more than that :/
21:56 🔗 Somebody2 Oh hell, I suppose I'll write something quick up.
21:56 🔗 ola_norsk can't get shittier than mine :D
21:57 🔗 Somebody2 :-P
21:58 🔗 ola_norsk a smashingly, awesome, mezmerizing 'pitch deck' though..the kind that could say the harshest of wallstreet investors..
21:58 🔗 ola_norsk sway*
21:58 🔗 ola_norsk that would be best
21:59 🔗 ola_norsk voice over by Alex Jones.. "Here's why data preservation is important for survival of the human specie!!!"
22:00 🔗 sep332 has joined #archiveteam-bs
22:00 🔗 ola_norsk but yeah, anyone else is better than me
22:01 🔗 Harzilein has there ever been a discussion of ia trying to infer locations from javascript (in before: halting problem)?
22:02 🔗 ez you mean a crawler with headless browser?
22:03 🔗 ez its resource intensive, but a lot of crawlers already do it. i f i were to guess, its not done as it would slow down crawl speed a lot.
22:04 🔗 Harzilein ez: well, possibly a way for the crawler to dump asts, then look if someone registered a way to get file locations from that ast. it's not even obfuscated nor really 'computation', just a format that currently blinds ia
22:05 🔗 Harzilein oh and that crawler would need to time travel too :/
22:05 🔗 sep332 has quit IRC (Read error: Operation timed out)
22:05 🔗 Harzilein i want to look up an image output format that my national weather service dropped when they "moved to open data" :(((
22:06 🔗 Harzilein and they only had it on ftp and clients had it on pages w/ javascript animations
22:06 🔗 ez generally, the only reliable way to run js these days is headless browser
22:06 🔗 ez phantomjs or headless chrome
22:07 🔗 ez 'halting problem' is not the issue as such, its more like 'insane, obtusely baroque web platform as a whole problem'
22:09 🔗 Somebody2 ola_norsk: OK, here's what I plan to write: https://0bin.net/paste/2+yIgWRGt6IUpjbp#twBLyDm6cHRXGXp-BYmeALC83pScu4311jR94QtyvNk
22:09 🔗 Somebody2 Please let me know any comments you have.
22:09 🔗 Harzilein this is pretty much 1990's style code (it just uses >dom0 because it needs to cache images): http://web.archive.org/web/20110119230906/http://wetter.tagesschau.de:80/radarbilder/
22:09 🔗 ola_norsk "mirroring some" ?
22:11 🔗 Harzilein those were just re-scaled versions of stuff from https://www.dwd.de/DE/leistungen/gds/gds.html, which is phased out in favour of https://opendata.dwd.de/
22:11 🔗 Somebody2 ola_norsk: Yes, that's how I read the response...
22:11 🔗 Somebody2 That they liked the idea of mirroring parts of archive.org
22:11 🔗 Somebody2 Ideally, all of it.
22:11 🔗 ola_norsk yeah
22:12 🔗 ola_norsk the email is perfect
22:12 🔗 Somebody2 But I certainly didn't see anything in their response that suggested they were *opposed* to starting by mirroring parts of it.
22:12 🔗 Somebody2 OK, cool, sending now.
22:13 🔗 ola_norsk i think IA needs some kind of public fact sheet, that shows that it's not just WayBackMachine :D
22:14 🔗 Somebody2 ola_norsk: Yeah, that would probably be good.
22:14 🔗 ola_norsk or preferably a video where kahle and scott fingers the storage while pointing it out :D
22:15 🔗 Somebody2 email sent.
22:15 🔗 ola_norsk "here's the U's of wayback, here's the U's of videos and news'
22:15 🔗 ola_norsk Somebody2: did you send just the person who responded?
22:15 🔗 Harzilein .oO( here's the U's of backups of old scene releases ;)
22:15 🔗 * Harzilein runs
22:15 🔗 Somebody2 ola_norsk: Yes.
22:15 🔗 ola_norsk ..and that
22:16 🔗 ola_norsk ok
22:16 🔗 Somebody2 Christian Torp.
22:16 🔗 ola_norsk yes he responed
22:17 🔗 ola_norsk i'm not sure what the title is called, 1 sec
22:18 🔗 Somebody2 It doesn't matter.
22:19 🔗 ola_norsk "Chief Operating Officer (COO)"
22:20 🔗 ola_norsk tone dalen did not respond when i wrote, but he did
22:21 🔗 ola_norsk welcome to beurocracy, i guess :D
22:22 🔗 * Somebody2 shrug
22:22 🔗 ola_norsk the benefit of it though, is there so many instances to nag to :D
22:25 🔗 ola_norsk other than DCS there's also 'Arts Council' who also have a lot of say in such matters
22:25 🔗 ola_norsk http://www.kulturradet.no/english
22:29 🔗 ola_norsk Somebody2: let me know if you get a response on the email, though it's christmas now so it might take a while
22:30 🔗 schbirid has joined #archiveteam-bs
22:30 🔗 ola_norsk fucking hell, i need to gooder up my formal english writing if you do :/
22:31 🔗 ola_norsk it's sad i'm the only norwegian presently here :/
22:34 🔗 ola_norsk there's not even a swede or a dane around?
22:35 🔗 ola_norsk seriously though, i hope to hear when/if you get a response
22:37 🔗 ez Harzilein: mirroring scene releases is doable. just rent a DC-hut in marshall islands (one of the few real-countries with no copyright laws)
22:38 🔗 ez then again, unauthorized copies kinda tend to "mirror" themselves
22:40 🔗 Harzilein ez: huh?
22:40 🔗 JAA I wouldn't be surprised if IA had its own stash of those. Darked, for obvious reasons.
22:41 🔗 Harzilein that's what i'm talking about
22:41 🔗 Harzilein there's oldschool ones in unsystematic blobs. they are far more interesting ones than those with the nice emulator frontends :)
22:43 🔗 ez bbs era isnt that challenging yea. for starters, 20 years of data produced back then equals to week of data produced now.
22:46 🔗 Harzilein anyway, my angle at this was its hard to get to "our" 10000 feet view that this is just "niche" data like any other, despite the providence
22:46 🔗 Harzilein +across
22:46 🔗 Harzilein -across
22:47 🔗 kimmer1 ola_norsk a Dane here.. cheers
22:48 🔗 ez Harzilein: most of it is garbage like current warez. the important bits (demo and tracker scene) is how IA actually started, didnt it?
22:49 🔗 ola_norsk kimmer1: skål :)
22:50 🔗 ez Harzilein: if were talking cultural heritage wrt piracy, there are certain niches in that niche where archiving would be of very high value. things like St.GIGA games.
22:50 🔗 ez its a bit like "pirate" recordings of tv shows which otherwise are long lost in the history.
22:50 🔗 icedice ola_norsk: Swedish speaking Finn here
22:51 🔗 ola_norsk ola_norsk: perkele :D
22:51 🔗 ola_norsk oops
22:51 🔗 icedice That's my reaction to Finnish
22:51 🔗 ola_norsk well, that's twice a good as an actual swede :D
22:52 🔗 ola_norsk anyway, if there's some south americans and some asians, the globe is covered :D
22:53 🔗 icedice ez: Private Layer is what a lot of pirate sites use for hosting. It's a Panamanian company that has servers in Switzerland (which is a pirate)
22:53 🔗 icedice 's paradise)
22:54 🔗 ez icedice: yes, there are few of shady isps catering to the unsavory markets
22:54 🔗 ez its funny how the actual scene on one hand shuns commercial sites (hosted in places like you mention), and on second hand it thrives on it
22:55 🔗 ez icedice: the supposed ethos is to stay under the radar. being herded by a provider "look, you can host your botnet/whatever here" is p much the opposite of that.
22:55 🔗 ez markets can sure play out in fun way
22:57 🔗 JAA This is getting too offtopic for this channel. Mind moving to #archiveteam-ot?
22:58 🔗 icedice They're a bit too shady for my taste nowadays though: https://www.lowendtalk.com/discussion/71510/grupo-panaglobal-15-s-a-private-layer-drama-allegedly-james-reed-mccreary-alpha-red
22:58 🔗 icedice Isn't #archiveteam-bs ment for off-topic stuff like this?
22:58 🔗 M9uy3 has joined #archiveteam-bs
22:58 🔗 M9uy3 hi, how to start that project? https://www.archiveteam.org/index.php?title=Blog.pl
22:59 🔗 ola_norsk M9uy3: 1 sec
23:00 🔗 JAA Hey M9uy3. So the first step would be to get a list of all the blogs hosted on blog.pl.
23:00 🔗 JAA That would probably mean grabbing all of http://www.blog.pl/katalog and creating a list out of that.
23:00 🔗 ola_norsk it doesn't seem to be a specific task made for it "yet"
23:01 🔗 M9uy3 ok, only URLs?
23:01 🔗 JAA What do the numbers there on the left mean? Are those numbers of blogs in the respective categories?
23:01 🔗 JAA If so, we're talking about millions of blogs.
23:01 🔗 icedice ez: What other countries are there that have no copyright laws?
23:01 🔗 M9uy3 7752304 blogs in all categories
23:01 🔗 JAA Oh dear.
23:01 🔗 icedice I think I've heard that Montenegro has none, at least
23:01 🔗 M9uy3 ;)
23:02 🔗 M9uy3 it will be a great crash
23:02 🔗 JAA So more than every sixth Pole has a blog there??
23:02 🔗 JAA (On average)
23:03 🔗 M9uy3 the project is online since 2001
23:03 🔗 JAA Hmm, ok, we'll have to think about how to do this then.
23:03 🔗 JAA They shut down end of January, right?
23:04 🔗 M9uy3 yes, 31th
23:04 🔗 JAA Mhm
23:05 🔗 JAA The links on /katalog appear to point at the newest post for each blog. That might be a good starting point.
23:06 🔗 JAA The image links, I mean.
23:06 🔗 JAA We're probably looking at billions of links in total though. :-|
23:07 🔗 M9uy3 the site is called 'blog.pl' but one can find there even school websites http://pspwasosz10.blog.pl/ :/
23:07 🔗 ola_norsk each of those links, linking an internal thingy/image usually have a single indefier do they not?
23:08 🔗 JAA ola_norsk: What do you mean?
23:09 🔗 ola_norsk i mean, instead of billions of links, some of that billion might all be linking to same e.g picture/post etc
23:09 🔗 ola_norsk stored on that domain, i mean
23:09 🔗 JAA No, the billions I mean are probably unique, though quite many of them might be 404s.
23:10 🔗 JAA I mean links like http://reniablicharz.blog.pl/?p=1660
23:10 🔗 JAA Changing the p parameter leads you to other posts.
23:10 🔗 JAA The next lower value that exists is 1654.
23:10 🔗 JAA Which redirects to the second-newest blog post.
23:10 🔗 JAA And so on.
23:11 🔗 JAA The canonical post URLs look different and contain of a date and a slug, arranged as /YYYY/MM/DD/slug.
23:11 🔗 JAA (Plus a slash at the end)
23:11 🔗 JAA This will have to be a warrior project, but even then I'm not sure it's feasible. This thing is fucking *massive*.
23:12 🔗 ola_norsk what if "Grupa Onet.pl SA" was willing to just give all the shit by closing time?
23:13 🔗 ola_norsk that could save a bit of work
23:13 🔗 M9uy3 you mean export somehow?
23:13 🔗 ola_norsk aye
23:13 🔗 ola_norsk it doesn't hurt to ask
23:13 🔗 JAA Feel free to do so.
23:13 🔗 ola_norsk (or demand, rudely) :D
23:14 🔗 ola_norsk kurwa, i do not speek polish :D
23:14 🔗 JAA "Give it to us, or we'll DDoS you!" :-P
23:14 🔗 ola_norsk that
23:14 🔗 M9uy3 i've been already in contact with them today because of the second shutting down :)
23:14 🔗 JAA Which would actually not be too far from the truth lol.
23:14 🔗 ez JAA: the blogs are just wordpress, nothing too spectacular there
23:15 🔗 ola_norsk "Give it to us, or we'll DDoS your future endevours!"
23:15 🔗 ez the issue indeed is how to get the subdomain urls
23:15 🔗 ez there are blogid and blog_id entries, but not yet api call translating id to subdomain found yet
23:15 🔗 Somebody2 ola_norsk: I will of course mention in the channel if I get a response.
23:15 🔗 ez unfortunately the front page cant be scraped, it limits paging numbers to 100
23:16 🔗 JAA ez: Yep, I know. This will be a good test for archiving Wordpress.com, which I assume will have to happen at some point and will be an absolute shitfest.
23:16 🔗 JAA At least we have an easy way to find all blogs there though (through the wp.me shortener; we did that in URLTeam a while ago).
23:17 🔗 JAA No such shortener here, unfortunately.
23:17 🔗 M9uy3 so, there is a need of URL list and/or a possibility to reupload the content somewhere?
23:18 🔗 ez the mirroring itself is doable within the timeframe
23:18 🔗 ez the issue is how to find what to mirror
23:18 🔗 ez ie write a category spider for the front page is probably the best one could do for now
23:18 🔗 ez unless better api is reverse engineered
23:18 🔗 ola_norsk Somebody2: good stuff. I think you will get answer, though maybe now at christmas time was the _worst_ time to write a mail to an organization :D
23:19 🔗 JAA ez: I don't see an API anywhere...?
23:20 🔗 M9uy3 there is no API
23:20 🔗 ez JAA: there isnt
23:20 🔗 ez the id is in javascript for ad serving
23:20 🔗 ez so we know there *is* in fact numerical id per blog
23:21 🔗 ola_norsk Somebody2: if not; If there's no response..which there were in my case, there's no fauly in eventually sending a new mail after a while
23:21 🔗 JAA Right
23:21 🔗 M9uy3 i wrote today earlier to them but the time is bad for such contacts (christmas)
23:21 🔗 ola_norsk Somebody2: fault*
23:22 🔗 ez JAA: however everything on the frontpage seems to use subdomain urls
23:22 🔗 M9uy3 I asked them for URL list for http://republika.onet.pl/ subdomains - another project to be down (in March) - less blog, more in type of 'Geocities'
23:23 🔗 JAA No occurrence of blog_id in any of the JS included on blogs either.
23:23 🔗 ez its directly in the html
23:24 🔗 ez just viewsource
23:24 🔗 ez hmm, http://www.blog.pl/data/cache/thumb_270x200/data/post-images/74105/79552.jpeg
23:24 🔗 JAA Yeah, I mean it isn't used anywhere.
23:24 🔗 ez var dige_vars = {"homepage_url":"www.blog.pl","category":"Spo\u0142ecze\u0144stwo","admin_url":"http:\/\/zarzadzanie.blog.pl\/krolowa-superstar.blog.pl\/wp-admin\/","template":"mystique","addthis":{"selector":".post-content","action":"append"},"blog_id":74105,"p
23:25 🔗 ez so the thumbs use blogid/postid format
23:25 🔗 Somebody2 ola_norsk: Well, there isn't really any *need* for them to respond to me -- that was the main point of my email. :-)
23:25 🔗 ez which is all nice, if there were something, anything, which could translate id to blog url
23:25 🔗 JAA We should make a channel for this.
23:25 🔗 JAA We'll need one down the line anyway.
23:25 🔗 Somebody2 They can simply go forward with getting funding for storage, then download a bunch of IA's stuff, and happily sit on it. :-)
23:25 🔗 ez #blog.pls ?
23:26 🔗 Somebody2 I'd hope they'd drop me (and info@archive.org) a quick note to say, "Hi, we've made a copy of 10PB of stuff, thanks for making it available!" -- but it's not requried...
23:27 🔗 ola_norsk Somebody2: i'm not sure what you mean by that, but NCS is the major computer/it association in Norway. Basically every computer/tech related company is member...I guess it's kind of like the NRA of computer stuff here
23:28 🔗 Somebody2 ola_norsk: What I mean is that the point of my email was that NCS does not need to talk to me any more in order to mirror IA.
23:28 🔗 Somebody2 So if they don't respond, it doesn't mean they aren't, you know, mirroring IA.
23:28 🔗 ola_norsk aye, they shoudln't..they can email archive.org themselves damnit
23:30 🔗 ola_norsk "what more can you tell me about the idea"..the fuckers should know how to google
23:30 🔗 Somebody2 ola_norsk: They don't need to email archive.org EITHER.
23:30 🔗 Somebody2 That was what I keep trying to point out to you!
23:31 🔗 ola_norsk they sure as hell don't need to ask me about 'something more concrete' though :D
23:31 🔗 icedice has quit IRC (Quit: Leaving)
23:31 🔗 Somebody2 I think the "something more concrete" was hopefully along the lines of what I suggested. :-)
23:32 🔗 ola_norsk or, maybe i should have just said straight up: It might need a couple of square meter of datalockers and racks
23:32 🔗 * Somebody2 going AFK
23:32 🔗 Somebody2 Yes, that probably would have been good. :-)
23:33 🔗 ola_norsk aye, but my english, or rather, my technical norwegian is not that proficiant
23:34 🔗 JAA ez: Sounds good to me.
23:36 🔗 JAA >>>>> Discussion on archiving blog.pl is now going on in #blog.pls
23:40 🔗 ola_norsk Somebody2: the best i can do is try to get people and organizations with 'sway' to consider it :/
23:40 🔗 ola_norsk Somebody2: for all i know, e.g UIO.no have already pitched the idea..
23:46 🔗 ola_norsk geographical location, political standing globally, and it's focus on 'green energy' and the somewhat hysterically habit of wanting to preserving old useless shit..would be a plus
23:49 🔗 icedice has joined #archiveteam-bs
23:51 🔗 ola_norsk in norway, trying to build new close to e.g even an old wooden gate is sometimes a cause for years of controversy :D
23:53 🔗 M9uy3 has quit IRC (Ping timeout: 260 seconds)
23:55 🔗 ola_norsk even 80s and 90s grafittis are at times deemed protected as 'cultural heritage'
23:57 🔗 ola_norsk the sad effect is, all the books in local libraries are old as fuck :/
23:59 🔗 ola_norsk has quit IRC (I never hurts to ask. Merry christmas! https://youtu.be/wmin5WkOuPw)

irclogger-viewer