#archiveteam-bs 2019-01-03,Thu

↑back Search

Time Nickname Message
00:13 πŸ”— wp494 has quit IRC (Ping timeout: 633 seconds)
00:13 πŸ”— wp494 has joined #archiveteam-bs
00:15 πŸ”— hdch has joined #archiveteam-bs
01:40 πŸ”— jut has quit IRC (Read error: Connection reset by peer)
01:45 πŸ”— jut has joined #archiveteam-bs
02:41 πŸ”— DFJustin has quit IRC (Ping timeout: 260 seconds)
04:06 πŸ”— jodizzle JAA: Anything that can be done to help save the UOL forums?
04:08 πŸ”— exoire has quit IRC (Read error: Operation timed out)
04:08 πŸ”— JAA jodizzle: Idk. The only hope to grab all of it at this point would probably be a warrior project, and that would have to start very soon.
04:09 πŸ”— JAA The footer claims 7.4M threads with 180M posts...
04:09 πŸ”— JAA I get less than that adding up the numbers, but it's still pretty large.
04:10 πŸ”— JAA And it's a custom forum software, so we don't have existing code we can reuse.
04:13 πŸ”— jodizzle That's unfortunate.
04:14 πŸ”— jodizzle I'd be willing to try some targeted/hand scraping, but unfortunately I don't have any experience writing Warrior projects.
04:15 πŸ”— jodizzle And it sounds like the issue is mostly just that the site is so large?
04:16 πŸ”— jodizzle Well, I guess that's usually the issue...
04:21 πŸ”— guest has joined #archiveteam-bs
04:36 πŸ”— qw3rty115 has joined #archiveteam-bs
04:40 πŸ”— Somebody2 ... continuing the not a really good idea conversation about keeping local copies of things ...
04:40 πŸ”— guest i get that it'd be a problem with something illegal or very dangerous (criticism of cartel or islamic terrorism) but the truecrypt website justu contained things like the format used, user manuals, etc.
04:41 πŸ”— JAA (Just FYI, this channel is also logged publicly and well-known.)
04:42 πŸ”— qw3rty114 has quit IRC (Read error: Operation timed out)
04:42 πŸ”— guest is there some sort of drama behind truecrypt shutting down that i'm not aware of? i thought they just wanted the project to die. i don't think they care about going after people with backups of the site
04:43 πŸ”— guest to the best of my knowledge at least
04:44 πŸ”— Somebody2 there was a LOT of drama around truecrypt shutting down
04:44 πŸ”— odemgi_ has joined #archiveteam-bs
04:44 πŸ”— Somebody2 I *think* it's reasonably well explained on the wikipedia page?
04:44 πŸ”— Somebody2 (haven't checked recently)
04:44 πŸ”— guest ok checking wikipedia
04:45 πŸ”— JAA Not much drama on there, but yeah, the shutdown was weird.
04:45 πŸ”— guest it was weird
04:46 πŸ”— JAA It seems that the authors pretty much want everyone to forget that TrueCrypt ever existed or something.
04:46 πŸ”— guest my guess (personally) was that they were getting harrassed by law enforcement and just didn't want to deal with it anymore
04:46 πŸ”— guest but at least veracrypt is a thing now and is well maintained
04:46 πŸ”— odemgi has quit IRC (Read error: Operation timed out)
04:46 πŸ”— odemg has quit IRC (Ping timeout: 265 seconds)
04:58 πŸ”— odemg has joined #archiveteam-bs
05:19 πŸ”— ndiddy has quit IRC (Quit: nighty night)
05:38 πŸ”— DFJustin has joined #archiveteam-bs
05:38 πŸ”— swebb sets mode: +o DFJustin
06:24 πŸ”— riley has joined #archiveteam-bs
06:45 πŸ”— jut has quit IRC (Ping timeout: 252 seconds)
06:46 πŸ”— jut has joined #archiveteam-bs
07:54 πŸ”— hdch has quit IRC (Ping timeout: 265 seconds)
08:03 πŸ”— marked What's the URL for UOL forums?
08:05 πŸ”— marked Nm it's on Reddit
08:16 πŸ”— hdch has joined #archiveteam-bs
08:16 πŸ”— tomaspark has quit IRC (Read error: Connection reset by peer)
08:22 πŸ”— LFlare has joined #archiveteam-bs
08:28 πŸ”— macrosoft has joined #archiveteam-bs
08:29 πŸ”— marked Ha, I forgot mil is 1000
09:07 πŸ”— wp494 has quit IRC (Ping timeout: 265 seconds)
09:07 πŸ”— wp494 has joined #archiveteam-bs
09:31 πŸ”— ubahn has joined #archiveteam-bs
09:40 πŸ”— ubahn_ has joined #archiveteam-bs
09:42 πŸ”— ubahn has quit IRC (Ping timeout: 260 seconds)
09:44 πŸ”— marked On beta UOL, the index pages are truncated by a lot. What is archive bot doing?
09:46 πŸ”— Kaz following orders, for better or worse
09:48 πŸ”— exoire has joined #archiveteam-bs
09:58 πŸ”— hook54321 has quit IRC (Quit: Connection closed for inactivity)
10:11 πŸ”— exoire has quit IRC (Read error: Operation timed out)
10:22 πŸ”— ubahn has joined #archiveteam-bs
10:23 πŸ”— ubahn_ has quit IRC (Read error: Operation timed out)
10:26 πŸ”— ubahn_ has joined #archiveteam-bs
10:26 πŸ”— marked UOL seems small. Easier to not use warriors, imho
10:28 πŸ”— ubahn has quit IRC (Ping timeout: 360 seconds)
11:09 πŸ”— hdch has quit IRC (Ping timeout: 265 seconds)
11:56 πŸ”— odemgi_ has quit IRC (Remote host closed the connection)
12:07 πŸ”— BlueMax has quit IRC (Quit: Leaving)
12:10 πŸ”— hook54321 has joined #archiveteam-bs
12:51 πŸ”— macrosoft has quit IRC (Ping timeout: 265 seconds)
13:21 πŸ”— benjinsmi has joined #archiveteam-bs
13:25 πŸ”— benjins has quit IRC (Read error: Operation timed out)
13:30 πŸ”— jut has quit IRC (Ping timeout: 252 seconds)
13:32 πŸ”— jut has joined #archiveteam-bs
13:40 πŸ”— Hani has quit IRC (Read error: Connection reset by peer)
14:38 πŸ”— Despatche has joined #archiveteam-bs
14:39 πŸ”— guest has quit IRC (Quit: y ppl s)
14:53 πŸ”— n00b928 has joined #archiveteam-bs
14:53 πŸ”— n00b928 has quit IRC (Client Quit)
15:38 πŸ”— hyku has joined #archiveteam-bs
15:47 πŸ”— hyku From my understanding this is the place for help correct?
16:10 πŸ”— Hani has joined #archiveteam-bs
16:29 πŸ”— JAA Does anyone in BE/NL have space for 30 tonnes of books? https://old.reddit.com/r/Archiveteam/comments/ac2f8j/about_30_tones_of_technical_books_and_articles/
16:30 πŸ”— JAA The Vlaamse Vereniging voor IndustriΓ«le Archeologie needs to get rid of 50k books and magazines.
16:37 πŸ”— Mateon1 has quit IRC (Ping timeout: 255 seconds)
17:00 πŸ”— benjinsmi has quit IRC (Leaving)
17:00 πŸ”— benjins has joined #archiveteam-bs
17:34 πŸ”— ubahn_ has quit IRC (Quit: ubahn_)
17:49 πŸ”— hook54321 has quit IRC (Quit: Connection closed for inactivity)
17:57 πŸ”— arkiver SketchCow: see what JAA wrote ^
17:58 πŸ”— arkiver what are these UOL forums
17:58 πŸ”— arkiver no page on the wiki
17:59 πŸ”— JAA arkiver: http://forum.jogos.uol.com.br/ http://forum.esporte.uol.com.br/ http://forum.televisao.uol.com.br/ http://forum.tecnologia.uol.com.br/ http://beta.forum.jogos.uol.com.br/
18:00 πŸ”— JAA I'll create a wiki page.
18:00 πŸ”— jut has quit IRC (Ping timeout: 252 seconds)
18:03 πŸ”— jut has joined #archiveteam-bs
18:08 πŸ”— wp494 has quit IRC (Ping timeout: 260 seconds)
18:08 πŸ”— wp494 has joined #archiveteam-bs
18:09 πŸ”— arkiver bringing the books up with IA
18:23 πŸ”— JAA arkiver: The TelevisΓ£o and Tecnologia forums were archived through ArchiveBot in November, and the archives seem to be complete (based on random clicking around in the WBM). The Esporte forums job back then crashed, but my new job yesterday seems to have completed; I didn't check yet how complete that is though. The jobs for the two Jogos forums are still running, and those are by far the largest
18:23 πŸ”— JAA ones.
18:23 πŸ”— JAA It's actually one forum, but some subforums are on the beta subdomain for whatever reason.
18:23 πŸ”— ubahn has joined #archiveteam-bs
18:23 πŸ”— JAA It seems that thread IDs are shared between the first four forums I linked, and the beta thing uses separate IDs.
18:23 πŸ”— JAA I'm not 100 % sure on that though.
18:24 πŸ”— ubahn has quit IRC (Client Quit)
18:24 πŸ”— ubahn has joined #archiveteam-bs
18:24 πŸ”— JAA "Shared IDs" means that IDs are globally unique rather than per forum. However, each ID only works on the specific forum where it exists.
18:25 πŸ”— ubahn has quit IRC (Client Quit)
18:25 πŸ”— JAA Thread IDs go to roughly 4.5 million on those four forums, despite the footer claiming that there are 7.4 million threads. (The beta forums are small in comparison at only 213k threads.)
18:26 πŸ”— JAA Not sure where that discrepancy comes from.
18:52 πŸ”— arkiver JAA: we can easily archive these threads, for example http://beta.forum.jogos.uol.com.br/t/14913819/bannon-e-eduardo-bolsonaro-encontro and http://beta.forum.jogos.uol.com.br/t/14913819/
18:56 πŸ”— arkiver and http://forum.jogos.uol.com.br/qual-console-comprar-ps4-ou-xbox-one_t_3341884 as http://forum.jogos.uol.com.br/_t_3341884
18:56 πŸ”— arkiver from http://forum.jogos.uol.com.br/_t_3341884 we can get http://forum.jogos.uol.com.br/qual-console-comprar-ps4-ou-xbox-one_t_3341884 again
18:57 πŸ”— arkiver but not sure we can get http://beta.forum.jogos.uol.com.br/t/14913819/bannon-e-eduardo-bolsonaro-encontro from http://beta.forum.jogos.uol.com.br/t/14913819/
19:00 πŸ”— SketchCow https://archive.org/details/@yurigagarin Look at this lad, this absolute unit
19:04 πŸ”— Mateon1 has joined #archiveteam-bs
19:09 πŸ”— Despatche has quit IRC (Quit: Error: Connection reset by peer)
19:21 πŸ”— arkiver JAA: making a project for at least http://forum.jogos.uol.com.br/
19:21 πŸ”— arkiver can I make you admin?
19:26 πŸ”— hook54321 has joined #archiveteam-bs
19:27 πŸ”— chimyatta has quit IRC (Ping timeout: 252 seconds)
19:28 πŸ”— chimyatta has joined #archiveteam-bs
19:35 πŸ”— Stilett0 has joined #archiveteam-bs
19:36 πŸ”— Stiletto has quit IRC (Read error: Operation timed out)
19:51 πŸ”— PurpleSym arkiver: The job for beta.* seems to be coming to and end now. efutp9w41j8cssacnrxgiycqp
19:51 πŸ”— arkiver good
19:52 πŸ”— arkiver and the not-beta?
20:05 πŸ”— arkiver https://tracker.archiveteam.org/uolforums/
20:05 πŸ”— arkiver https://github.com/ArchiveTeam/uolforums-grab
20:08 πŸ”— JAA arkiver: Sure. I won't have much time for coding or testing, but I can help with the queue management.
20:09 πŸ”— arkiver yeah, donΒ΄t have time for the queue stuff
20:09 πŸ”— arkiver IΒ΄m adding 100000 items now, and the you can do the rest
20:10 πŸ”— JAA Ack
20:10 πŸ”— JAA The non-beta AB job is nowhere near finishing, by the way.
20:11 πŸ”— JAA And I'm not entirely convinced the beta AB job got everything. Either it errored out on the pagination and will still continue for a while, or it probably missed some stuff. 700k URLs for 213k threads just doesn't seem right.
20:12 πŸ”— arkiver yeah
20:13 πŸ”— arkiver shall I queue thread 0-999999 for jogos now?
20:13 πŸ”— arkiver threads*
20:18 πŸ”— arkiver queued.
20:18 πŸ”— arkiver FOS is the target
20:19 πŸ”— arkiver now the default project
20:40 πŸ”— JAA Sweet, thanks.
21:00 πŸ”— adinbied has joined #archiveteam-bs
21:00 πŸ”— BasDub has joined #archiveteam-bs
21:02 πŸ”— Mateon1 has quit IRC (Read error: Operation timed out)
21:03 πŸ”— DasBub has quit IRC (Read error: Operation timed out)
21:08 πŸ”— Mateon1 has joined #archiveteam-bs
21:10 πŸ”— VerfiedJ has joined #archiveteam-bs
21:15 πŸ”— macrosoft has joined #archiveteam-bs
21:19 πŸ”— adinbied has quit IRC (Quit: Leaving)
21:22 πŸ”— antomatic has quit IRC (Read error: Operation timed out)
21:33 πŸ”— DosBob has joined #archiveteam-bs
21:37 πŸ”— BasDub has quit IRC (Ping timeout: 252 seconds)
21:37 πŸ”— DosBob is now known as DasBub
21:39 πŸ”— Stiletto has joined #archiveteam-bs
21:45 πŸ”— Stilett0 has quit IRC (Read error: Operation timed out)
21:49 πŸ”— tomaspark has joined #archiveteam-bs
22:05 πŸ”— Dj-Wawa has joined #archiveteam-bs
22:08 πŸ”— JAA arkiver: Can I have access to uolforums-items (and maybe also -grab)?
22:12 πŸ”— Wizzito has joined #archiveteam-bs
22:12 πŸ”— Wizzito So many 0/O.1 MBs on the UOL tracker
22:12 πŸ”— Wizzito ... we've done 2 GB so far, eh?
22:16 πŸ”— antomatic has joined #archiveteam-bs
22:16 πŸ”— swebb sets mode: +o antomatic
22:19 πŸ”— arkiver most threads seem empty
22:26 πŸ”— BlueMax has joined #archiveteam-bs
22:27 πŸ”— Stilett0 has joined #archiveteam-bs
22:29 πŸ”— Stiletto has quit IRC (Read error: Operation timed out)
22:30 πŸ”— astrid yes Wizzito we try to keep #archiveteam quiet, so people who aren't very active can catch up on new stuff
22:30 πŸ”— Wizzito ok
22:30 πŸ”— astrid without having to read past pages and pages of random discussions
22:34 πŸ”— DasBub has quit IRC (Quit: rebeught)
22:35 πŸ”— JAA The small items seem legit. Even very old threads seem to use larger IDs. I've only seen a handful of threads under 1 million.
22:36 πŸ”— DasBub has joined #archiveteam-bs
22:37 πŸ”— JAA arkiver: Access to uolforums-items please? We're already halfway done with the first 100k items. :-)
22:38 πŸ”— exoire has joined #archiveteam-bs
22:39 πŸ”— JAA Also, http://jsuol.com.br/p/forum/j/funcoes_admin.js?1.1.17 is a candidate for the ignore list.
22:40 πŸ”— JAA Hmm, what's with errors like this? http://forum.jogos.uol.com.br/_t_221480 "Erro buscando a pΓ‘gina 1 do tΓ³pico 221480"
22:40 πŸ”— JAA Apparently that's returned as a HTTP 200.
22:41 πŸ”— JAA Wait no, now I get a redirect to the homepage.
22:45 πŸ”— JAA Reducing the rate to see if we see less of those errors then.
22:46 πŸ”— JAA arkiver: ^ What do we want to do about that?
22:46 πŸ”— JAA Abort when there's such an error?
22:55 πŸ”— arkiver yes
22:56 πŸ”— JAA I think I've seen a different error before as well, but I don't remember the message.
22:56 πŸ”— arkiver you have access now
22:56 πŸ”— Stilett0 has quit IRC (Ping timeout: 246 seconds)
22:58 πŸ”— JAA Here's an example HTML page with that error from http://forum.jogos.uol.com.br/_t_692873 : https://transfer.sh/aGKYU/F%C3%B3rum%20UOL%20Jogos%20::%20%C3%8Dndice%20do%20f%C3%B3rum.html
23:00 πŸ”— Stiletto has joined #archiveteam-bs
23:19 πŸ”— Stilett0 has joined #archiveteam-bs
23:21 πŸ”— Stiletto has quit IRC (Read error: Operation timed out)
23:26 πŸ”— JAA Pausing the project. arkiver, can you add a fix for that error?
23:27 πŸ”— Wizzito Oh, we're pausing the UOL grab? Can I turn off my Warrior if that's the case?
23:31 πŸ”— JAA Just tried provoking that error under load on an existing topic, but that didn't seem to work. Not sure if that means we didn't miss content though. In any case, we should abort when we get the error probably.
23:32 πŸ”— JAA Wizzito: You can do that whenever you want, or you can let it run and automatically resume when we're ready again. Up to you really.
23:33 πŸ”— yano is there a channel for the UOL project?
23:38 πŸ”— JAA Nope
23:38 πŸ”— JAA Considering more of UOL will go down soon most likely (as mentioned earlier in #archiveteam), we might want to create one though.
23:40 πŸ”— JAA Any good ideas for a channel name?
23:49 πŸ”— chimyatta has quit IRC (Quit: quitting)

irclogger-viewer